Introduction

Recently, Correia, Luck and Verner (2020) (henceforth CLV) put forward a very interesting, policy relevant paper that, among other things, discuss whether Non Pharmaceutical Interventions (NPIs) mitigated the adverse economic effects of the 1918 Spanish Flu pandemic on economic growth (this is only one part of their paper, to be clear). They find some suggestive evidence that cities that intervened earlier and more aggressively during the 1918 Spanish Flu pandemic did not perform worse in terms of economic activity and, if anything, grow faster after the pandemic is over. As a result, the authors interpret these findings as suggestive evidence that NPIs may mitigate the adverse economic consequences of a pandemic.

Motivated by the findings of CLV, Lilley, Lilley and Rinaldi (2020) (LLR) decided to analyze similar data to shed light on the “robustness” of the findings reported by CLV. LLR find that a portion of the results in CLV is not robust to changes in the methodology initially adopted, in the sense that, in LLR words, “the estimated effect of NPIs on economic growth are a noisy zero; we can neither rule out substantial positive nor negative effects of NPIs on employment growth”. LLR make all their analysis and data publicly available at https://almlgr.github.io/. This is open-science at its best!

In this document, my main goal is to briefly revisit this on-going debate by using some of the difference-in-differences methods that I have been working on in the last couple of years with Brantly Callaway and Jun B. Zhao. This would hopefully allow me to better understanding the “source” of the discussion. Having said that, I would like to stress that this exercise is mostly for my own understanding of the methodological issues behind the debate. I do not have expert knowledge about the 1918 Spanish flu pandemic, and the approach I take here may be full of caveats. As so, please interpret everything I say here with a grain of salt.

In the rest of this note, I do the following:

  1. I summarize part of this debate, highlighting how the CLV and LLR arrive at somehow different findings.

  2. I attempt to explain some challenges behind the methodology adopted:

    i.The “treatment” variables underlying CLV and LLR analysis are not binary as is usually the case in “traditional” difference-in-difference (DiD) studies. Because of that, at least to me, it is not entirely clear what type of “parallel trends assumptions” these papers are relying on to identify causal effects. This may play a role on how we should assess the “reliability” of the results. In fact, I find that checking for zero pre-trends as a way to assess the credibility of the underlying assumptions may not be a good idea here, as several of the covariates are measured in “later” pre-treatment periods(e.g. in 1914 and 1917), so using them as a covariates for “earlier” pre-treatment periods (e.g. 1904 and 1899) may not be appropriate.

    1. In these “non-standard” DiD setups, at first it is not clear if the coefficient underlying the outcome regression models used by both CLV and LLR recover “easy-to-understand” treatment effect parameters when treatment effects are allowed to heterogeneous with respect to the treatment intensity. My concern is that the TWFE regression underlying the CLV and LLR analysis may be implicitly imposing some homonegeity and linearity conditions that perhaps are not warranted. The usual “best linear approximation”-type interpretation may not be warranted here as the goal is to estimate causal effects.

    2. The inference procedures behind both CLV and LLR are pointwise and not simultaneous. This means that, in practice, one may find statistically significant effects by chance as these pointwise confidence intervals do not account for multiple-testing problems.

  3. In order to partially address the challenges described in points 1 and 2 above, I discretize the treatment variables to map the analysis back to the more standard DiD analysis. Although this may be full of caveats and potential drawbacks, it allow us to use some methods that are perhaps more “robust”. The results from this exercise provide suggestive evidence favoring the CLV original findings that adopting more intensive NPIs can help alleviating the negative economic effects of a pandemic.

Summary of the “debate”

In this section, I try to briefly summarize part of the “debate” between CLV and LLR that relates to pre-trends and model specifications. This is a simplified version of the discussion, as I am not revisiting the potential issues related to redefinition of city borders brought up by LLR; I strongly recommend you to read the entire exchange between CLV and LLR to get all their points. Here, my goal is to provide a “broad picture” that is more related to econometrics. I will also replicate their findings, using their proposed regression specifications, and using the data that LLR made available here. Reproducible research, YAY!!!

Replicating CLV

Using data from 43 U.S. cities from 1909 to 1923, CLV find that NPIs reduced the adverse economic effects of the 1918 Spanish Flu pandemic. They use two different outcome measures: \((i)\) city level manufacturing (log) output, and \((ii)\) city level manufacturing (log) employment. They also use two different NPI measures: \((a)\) NPI intensity, measured as the total cumulative days of NPIs per city, and \((b)\) Speed of NPI, measured as the number of days between when the weekly excess death rate exceeds two times the baseline influenza and pneumonia death rate and the date that an NPI measure is activated; see Correia, Luck and Verner (2020) for additional details.

In order to arrive at these conclusions, CLV make use of the very popular and widely adopted TWFE event-study linear regression specification: \[ Y_{c,t} = \alpha_c + \tau_t + \sum_{j \not= 1914}\beta_j NPI_{c,1918} 1\{j=t\} + \sum_{j \not= 1914}\gamma_j X_{c} 1\{j=t\} + \varepsilon_{c,t} \] where \(Y_{c,t}\) is the log-outcome of interest, \(\alpha_c\) is the city fixed-effect, \(\tau_t\) is the year fixed-effect, \(\varepsilon_{c,t}\) is the idiosyncratic error term, and \(X_c\) is a vector of time-invariant covariates including log 1910 population, percentage of the city population employed in manufacturing in 1914 and influenza mortality in 1917, as well as state-level measures of the 1910 urban share, agricultural employment share and per capital income.

The parameters of interest in the aforementioned TWFE linear regression are the \(\beta_j\)’s, also known as the event-study coefficients. The \(\beta_j\)’s with \(j \ge1918\) are usually interpreted as average treatment effect parameters at time period \(j\), and the \(\beta_j\)’s with \(j<1918\) are usually used to asses the credibility of a ``parallel trends assumption’’ (more on this below).

In what follows, I run the regressions associated with these specifications using both outcomes and both NPIs measures and plot the \(\beta_j\) coefficients together with their pointwise 90% confidence intervals. For the confidence intervals, I use standard errors clustered at the city-level. I report 90% confidence intervals because, in our opinion, with 43 cities, it would perhaps be “too much” to expect 95% confidence intervals.

# load packages and set ggplot template
devtools::install_github("pedrohcgs/DRDID", dependencies = T)
devtools::install_github("bcallaway11/did", dependencies = T)
library(DRDID)
library(did)
library(haven)            # package to read Stata file
library(lfe)
library(fastDummies)
library(ggthemes)
library(tidyverse)
library(patchwork)
#theme_set(theme_bw())
set.seed(20200517)
#-------------------------------------------------------------------------------------------------
# import the data in long format (save after run Analysis.do )
sflu_orig <- haven::read_dta("sf_processed.dta")
# get data without missing
sflu <- subset(sflu_orig, (is.na(sflu_orig$DaysofNPI) == F) == 1)

# Get CLV subsample
sflu_correia <- subset(sflu, (sflu$Year >= 1909) * (sflu$Year <= 1924) == 1)
#-------------------------------------------------------------------------------------------------
# Estimate Event-Study regressions
# Interact covariates with times
# Days of NPI
covs_days = model.matrix(~ (DaysofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu_correia)
# Speed of NPI
covs_speed = model.matrix(~(SpeedofNPI + AgricultureEmpShare1910+
                                   ManufacturingEmpShare1914+
                                   UrbanPopShare1910  +
                                   IncomePerCapita1910  +
                                   Pop1910_ln  +
                                   Mortality1917):factor(Year),
                         data = sflu_correia)

# Drop interaction with 1914 (baseline) and also the intercept
covs_days = covs_days[,-grep(")1914$", colnames(covs_days))]
covs_days = covs_days[,-1]

covs_speed = covs_speed[,-grep(")1914$", colnames(covs_speed))]
covs_speed = covs_speed[,-1]

# Manufacturing output and days
twfe_days_out <- lfe::felm(CityManuOutput_ln ~ covs_days | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing employment and days
twfe_days_emp <- lfe::felm(CityManuEmp_ln ~ covs_days | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing output and speed
twfe_speed_out <- lfe::felm(CityManuOutput_ln ~ covs_speed | Year + City| 0 | City,
                                    data = sflu_correia)

# Manufacturing employment and speed
twfe_speed_emp <- lfe::felm(CityManuEmp_ln ~ covs_speed | Year + City| 0 | City,
                                    data = sflu_correia)

# function to make plots
makeplot1 <- function(model, title) {
  broom::tidy(model, conf.int = TRUE, conf.level = 0.90) %>%
    filter(row_number() <= 4) %>%
    mutate(year = c(1909, 1919, 1921, 1923)) %>%
    bind_rows(tibble(year = 1914, estimate = 0, std.error = 0, 
                     conf.low = 0, conf.high = 0)) %>% 
    filter(!is.na(estimate)) %>% 
    ggplot(aes(x = year, y = 100 * estimate)) +
    ggtitle(title) + 
    geom_hline(yintercept = 0,color = "red", linetype  = "dashed") +
    scale_x_continuous(breaks = c(1909, 1914, 1919, 1921, 1923)) +
    geom_pointrange(aes(ymin = 100*conf.low, ymax = 100*conf.high), show.legend = FALSE)+
    theme(plot.title = element_text(hjust = 0.5, size=12))+
    labs(x = "Year", y = "Coefficient Estimate (x 100)")

}
# make our four plots
plot1 <- makeplot1(twfe_days_emp, "NPI intensity and log manufacturing employment.")
plot2 <- makeplot1(twfe_speed_emp, "NPI speed and log manufacturing employment.")
plot3 <- makeplot1(twfe_days_out, "NPI intensity and log manufacturing output")
plot4 <- makeplot1(twfe_speed_out, "NPI speed and log manufacturing output")

# put into a single plot
plot1 +  plot3 + plot2 + plot4 + plot_layout(ncol = 2) +
   plot_annotation(
  caption = 'Confidence intervals are based analytical formula. 90% pointwise confidence intervals displayed in black.')