Lecture 15

# Econ 520: Data Science for Economists

## Lecture 15: Inverse Probability Weighted Estimators
<br>

<p align=center>
Pedro H. C. Sant'Anna
</p>
<div style="margin-top: -.7cm;"></div>
<p align=center>
Emory University
</p>
<br>
<p align=center>
Spring 2024
</p>

---
class: center, middle
name: prologue

# Main Goal

---
# Main Goals for next two lectures
<style type="text/css">
.small-code .remark-code{
  font-size: 44%
}

.medium-code .remark-code{
  font-size: 55%
}
</style>

- The main goal of this lecture is to discuss how we can estimate and make inference about ATE and ATT using inverse probability weighted (IPW) estimators, under unconfoundedness.

- We will cover the principles of it.

- Today ,we will essentially cover the intuition behind IPW estimators, with (complex) experiments.

- Next lecture, we will build on this intuition and discuss the theory behind it.

- We will talk about Maximum likelihood estimators for binary dependent variables.

- We will talk about the importance of the overlap assumption.

---
class: center, middle
name: CI

# Quick recall: <br> ATE and ATT via regression adjustments

---

# Unconfoundedness + Overlap

- In the previous lectures, we discussed how we can estimate the ATE and ATT using linear regression adjustments.

- The base of the argument was the unconfoundedness and the overlap assumptions.

- With these two assumptions, we can write the ATE and ATT as
`$$\begin{eqnarray*}
ATE &\equiv& E[Y(1)- Y(0)] ~~~~~~~~~~~~= E[m_{D=1}(X)] - E[m_{D=0}(X)],\\
ATT &\equiv& E[Y(1)- Y(0)|D=1] = E[Y|D=1] - E[m_{D=0}(X)|D=1],
\end{eqnarray*}$$`

where `$m_{D=d}(X) \equiv E[Y|D=d,X]$` is a conditional expectation function (that can be estimated with regressions).
---
# Plug-in estimators

- We can estimate the ATE and ATT by plugging in the estimates of the conditional expectations.

- To do that, all we need to do is come up with a model for `$E[Y|D=1,X]$` and `$E[Y|D=0,X]$`, i.e., models for `$m_{D=1}(X)$` and `$m_{D=0}(X)$`.

- We can pose a linear model for `$E[Y|D=d,X]$`, i.e., `$m_{D=d}(X) = X'\beta^{D=d}$`, and use the linear regression model among units with `$D_i=d$` to estimate these unknown coefficients, i.e., run the following regression:
`$$\begin{eqnarray*}
Y_i = X_i'\beta^{D=0} + \varepsilon^{D=0}_i ~ \text{among units with }D_i=d.
\end{eqnarray*}$$`

---
# Plug-in estimators
- Once we have estimates of `$\beta^{D=1}$` and `$\beta^{D=0}$`, we can plug them into the formulas for the ATE and ATT to get the plug-in estimators of these parameters as
`$$\begin{eqnarray*}
\widehat{ATE} &=&  E_n[X'\widehat{\beta}^{D=1}] - E_n[X'\widehat{\beta}^{D=0}], \\
\widehat{ATT} &=& E_n[X'\widehat{\beta}^{D=1}|D=1] - E_n[X'\widehat{\beta}^{D=0}|D=1]
\end{eqnarray*}$$`
where, for a generic variable `$Z$`, `$E_n[Z]$` and `$E_n[Z|D=1]$` are the sample averages of `$Z$` among all units and 
among treated units, respectively.

- We can use bootstrap procedures or compute analytical standard errors of these estimators via Delta-Method (and influence functions) to make inference about the ATE and ATT.

- We have seen this in the previous lectures.
---

# But what if my regression model is not that great?

---

# What if we have more flexible models for `$m_{D=d}(X)$`?
- .hi[What if we were to be more flexible about the functional form of the conditional expectations?]

- .hi[What would be the implications of that?]

- .hi[Would it be a good idea?]

- .hi[Are there challenges?]

- .hi[How would you implement these more flexible models?]
---

# Is there another route to get ATE and ATT <br> without regression adjustments?

---
# Let's think a experiment

- Let's entertain some cool thoughts here.

- Suppose we were given a blank check and we can run any offline experiment we want.

- We can start with a simple randomized control trial (RCT): `$Y(1), Y(0) \perp D$`.

- To deploy this, all we need to do is to decide the proportion of units that should be treated, and then we roll the dice to decide who gets treated.
  
  - We can then estimate the ATE as the difference in the average outcomes between treated and untreated units, i.e.,
  `$$\begin{eqnarray*}
  \widehat{ATE} &=& E_n[Y|D=1] - E_n[Y|D=0]
  \end{eqnarray*}$$`
  
---
# Let's open up this estimator a bit more:
<small>
`$$\begin{eqnarray*}
  \widehat{ATE} &=& E_n[Y|D=1] - E_n[Y|D=0]\\
  &=& \dfrac{1}{n_1}\sum_{i:D_i=1} Y_i - \dfrac{1}{n_0}\sum_{i:D_i=0} Y_i\\
  &=& \dfrac{1}{n_1}\sum_{i=1}^n D_i Y_i - \dfrac{1}{n_0}\sum_{i=1}^n (1 - D_i) Y_i\\
  &=& \dfrac{\sum_{i=1}^n D_i Y_i}{\sum_{j=1}^n D_j} - \dfrac{\sum_{i=1}^n (1 - D_i) Y_i}{\sum_{j=1}^n (1 - D_j)}\\
  &=& \dfrac{n^{-1}\sum_{i=1}^n D_i Y_i}{n^{-1}\sum_{j=1}^n D_j} - \dfrac{n^{-1}\sum_{i=1}^n (1 - D_i) Y_i}{n^{-1}\sum_{j=1}^n (1 - D_j)}\\
  &=& \dfrac{1}{n}\sum_{i=1}^n \dfrac{D_i}{\widehat{p}} Y_i - \dfrac{1}{n}\sum_{i=1}^n \dfrac{(1 - D_i)}{1 - \widehat{p}} Y_i\\
  &=& E_n\left[\dfrac{D_i}{\widehat{p}} Y_i\right] - E_n\left[\dfrac{(1 - D_i)}{1 - \widehat{p}} Y_i\right]
  \end{eqnarray*}$$`
  with `$\widehat{p} = n^{-1}\sum_{i=1}^n D_i$`.
</small>

---
# But what is going on?
- The above algebra shows that, in this simple RCT, the comparisons of means estimator for the ATE can be written as a weighted average of the outcomes, where the weights are the inverse of the propensity score (which is a constant in this design):

`$$\begin{eqnarray*}
  \widehat{ATE} &=& E_n[Y|D=1] - E_n[Y|D=0]\\
  &=& E_n\left[\dfrac{D_i}{\widehat{p}} Y_i\right] - E_n\left[\dfrac{(1 - D_i)}{1 - \widehat{p}} Y_i\right] \\
  &=& E_n\left[\bigg(\dfrac{D_i}{\widehat{p}} - \dfrac{(1 - D_i)}{1 - \widehat{p}}\bigg)Y_i\right].
  \end{eqnarray*}$$`

- This is the idea behind the Inverse Probability Weighting (IPW) estimator.

- Each group is weighted by the inverse of the probability of being in that group.

---
# What if we have a more complex experiment?

- .hi[But what if our RCT is a bit more complex and depends on the covariates `$X$`?]

- This is actually what many real-life experiments look like, as the treatment assignment is often based on the characteristics of the units.

- In this sense, we want to have flexibility for assigning units to treatment with different probabilities that depend on their characteristics.

- In such cases, we would have `$Y(1), Y(0) \perp D|X$`.

- In experiments, we would have this as a design feature (treatment allocation protocol).

- Implying that we know `$p(X) = E[D|X] = P(D=1 |X)$`.

---
# Can we still move forward with IPW?

- With these more complex protocols, we unfortunately cannot rely on the simple comparisons of means estimator to estimate the ATE

- But we already know that, as we have spend a lot of time discussin regression adjustments estimators.
  
- However, we can still use an IPW estimator to estimate the ATE and ATT, as long as we have unconfoundedness and commmon support (overlap).

- In this example, unconfoundedness hold by design, as `$Y(1), Y(0) \perp D|X$`.
  
  - Common support requires that `$0 < p(X) < 1$` for all `$X$`, which the treatment protocol should ensure.

---
# IPW Estimand

- This follows because, under unconfoundedness and common support, we have that the ATE can be written as:
`$$\begin{eqnarray*}
  ATE &=& E[Y(1)] - E[Y(0)]\\
  &=& E\left[\dfrac{D}{p(X)}Y\right] - E\left[\dfrac{1 - D}{1 - p(X)} Y\right]\\
  &=& \dfrac{E\left[\dfrac{D}{p(X)}Y\right]}{E\left[\dfrac{D}{p(X)}\right]} - \dfrac{E\left[\dfrac{1 - D}{1 - p(X)} Y\right]}{E\left[\dfrac{1-D}{1-p(X)}\right]}.
  \end{eqnarray*}$$`

- .hi[Why? Are you able to prove this?]
---
# IPW Estimator with known `$p(X)$`

- If you know the propensity score `$p(X)$`, you can estimate the ATE using the IPW estimator as follows:

`$$\begin{eqnarray*}
  \widehat{ATE}_{ipw} &=&  \dfrac{E_n\left[\dfrac{D}{p(X)}Y\right]}{E_n\left[\dfrac{D}{p(X)}\right]} - \dfrac{E_n\left[\dfrac{1 - D}{1 - p(X)} Y\right]}{E_n\left[\dfrac{1-D}{1-p(X)}\right]} \\
  &&\\
  &=& E_n\left[\widehat{w}_{1} (D,p(X)) ~Y\right] - E_n\left[\widehat{w}_{0} (D,p(X))  ~Y\right],
  \end{eqnarray*}$$`
where <div style="margin-top: -1.5cm;"></div>
`$$\begin{eqnarray*}
  \widehat{w}_{1} (D,p(X)) &=&  \dfrac{\dfrac{D}{p(X)}}{E_n\left[\dfrac{D}{p(X)}\right]}, ~~~~\text{and}~~~
  \widehat{w}_{0} (D,p(X)) &=& \dfrac{\dfrac{1 - D}{1 - p(X)}}{E_n\left[\dfrac{1-D}{1-p(X)}\right]}.
  \end{eqnarray*}$$`

---

# Addendum: Hájek vs Horvitz-Thompson Estimators

- The above estimators for the ATE is known as the Hájek IPW estimator.
  
  - An important feature is that the weights always average 1, in finite samples.
  
- Another popular IPW estimator is the Horvitz-Thompson estimator, which is defined as:
`$$\begin{eqnarray*}
  \widehat{ATE}_{ht} &=&  E_n\left[\dfrac{D_i}{p(X_i)}Y_i\right] - E_n\left[\dfrac{1 - D_i}{1 - p(X_i)}Y_i.\right]
  \end{eqnarray*}$$`

- Notice that the HT weights do not necessarily average 1, in finite samples.

- In practice, this can translate to more variability in the HT estimator, compared to the Hájek estimator, especially in setups with not-so-strong overlap.
---
# IPW Estimand for the ATT

- Following a similar logic, we can also identify the ATT with the following IPW estimand:
`$$\begin{eqnarray*}
  ATT &=& E[Y(1) - Y(0)|D=1]\\
  &=& \dfrac{E\left[DY\right]}{E[D]} - \dfrac{E\left[\dfrac{(1-D) p(X)}{1 - p(X)}Y\right]}{E[D]}\\
 &=& E\left[w_1^{att}(D)Y\right] - E\left[w_0^{att}(D,p(X)Y\right],
  \end{eqnarray*}$$`
  with weights given by <div style="margin-top: -1.0cm;"></div>
  
  `$$\begin{eqnarray*}
  w_1^{att}(D) &=& \dfrac{D}{E[D]}, ~~~~\text{and}~~~ w_0^{att}(D,p(X)) = \dfrac{\dfrac{(1-D) p(X)}{(1-p(X))}}{E\bigg[\dfrac{(1-D) p(X)}{(1-p(X))}\bigg]}.
  \end{eqnarray*}$$`

---
# Inference with known `$p(X)$`

- If you know the propensity score `$p(X)$`, the HT-weights can be treated as known, and you can proceed with standard inference procedures.

- If you want to use the normalized weights, i.e., the Hájek IPW estimator, we should acknowledge that the weights are random, and we need to account for this randomness in our inference.

- In this case, we can use the influence function to derive the asymptotic distribution of the IPW estimator.

- You can also use the bootstrap to estimate the standard errors of the IPW estimator.

- A lot of this is available in the `WeightIt` package in R. See also the `cobalt` package.

---
class: center, middle
name: CI

# What if `$p(X)$` is unknown?

---
# Moving forward with unknown `$p(X)$`

- If you do not know the propensity score `$p(X)$`, all you have to do is to estimate it!

- But how?

- Recall that `$p(X) = E[D|X] = P(D=1 |X)$`

- In other words, propensity score is the conditional probability of being treated, given the covariates.

- As a result, `$p(X)$` is restricted to be between 0 and 1.

- OLS is not a great estimation procedure for that, because it does not enforce these shape constraints.

---
# Moving forward with unknown `$p(X)$`

- So instead of assuming that `$E[D|X] = X'\beta$`, we can assume that `$$p(X)= E[D|X] = \dfrac{\exp(X'\beta)}{1 + \exp(X'\beta)}.$$`

- More generally, it is common to assume that `$p(X) = E[D|X] = \Lambda(X'\beta)$`, where `$\Lambda(\cdot)$` is a link function that maps the linear predictor to the unit interval.

- Examples of link functions include the logit, probit, and cloglog functions.

- When `$\Lambda(\cdot)$` is the logit function, i.e., $\Lambda(\cdot) =\Lambda_{lg}(\cdot)\equiv \dfrac{\exp(\cdot)}{1 + \exp(\cdot)} we have the logistic regression model.

- The logistic link function is the most popular one.

---
# Estimating `$p(X)$`

- The logistic regression is a special case of a class of regressions called binary response models.

- As the name suggests, binary response models are used when the dependent variable is binary.

- In our causal inference context, the dependent variable of the logit is the treatment indicator `$D$`, and `$X$` are its regressors.

- The logit model is estimated by maximizing the (log) likelihood function, which is the probability of observing the data given the parameters.

---
# Maximum likelihood for binary outcomes

- The likelihood function for the logit model is given by:
`$$\begin{eqnarray*}
  L(\beta) &=& \prod_{i=1}^n p(X_i)^{D_i} (1 - p(X_i))^{1 - D_i}\\
  &=& \prod_{i=1}^n \Lambda_{lg}(X_i'\beta)^{D_i} (1 - \Lambda_{lg}(X_i'\beta))^{1 - D_i}.
  \end{eqnarray*}$$`
  
- We do not like products that much, so we take the log to transform this to sums:
`$$\begin{eqnarray*}
  log(L(\beta)) &=& log\bigg(\prod_{i=1}^n \Lambda_{lg}(X_i'\beta)^{D_i} (1 - \Lambda_{lg}(X_i'\beta))^{1 - D_i}\bigg)\\
  &=& \sum_{i=1}^n D_i log\big(\Lambda_{lg}(X_i'\beta)) + (1 - D_i)log(1 - \Lambda_{lg}(X_i'\beta))
  \end{eqnarray*}$$`
---
# Maximum likelihood for binary outcomes

- Our Maximum Likelihood Estimator (MLE) for `$\beta$` is then given by the vector of `$\beta$` that maximize the likelihood function
`$$\begin{eqnarray*}
  \widehat{\beta} &=& arg max_{\beta}~ log(L(\beta)) \\
  &=& arg max_{\beta} ~\sum_{i=1}^n D_i log\big(\Lambda_{lg}(X_i'\beta)) + (1 - D_i)log(1 - \Lambda_{lg}(X_i'\beta))
  \end{eqnarray*}$$`
  
- This is a concave function, so to find the maximum, all we need to do is to take the derivative of the loss function with respect to `$\beta$`, make them equal to zero, and solve the system of equation.

- Unlike OLS, there is no closed form-solution for the estimated `$\beta$`.

- Solutions rely on numerical methods.
---
# Logit, in practice

- In practice, we can use the `glm` function in `R` to bypass all these complications.

```r
# simulate some data
n = 100
x1 <- rnorm(n)
x2 <- rnorm(n)
# generate true propensity score
ps <- plogis(x1 - x2)
# generate treatment indicator
d <- (runif(n)<= ps)
# Estimating logit
pscore_hat <- glm(d ~ x1 + x2, 
                  family = binomial(link = "logit"))
```
---
# Logit, in practice

```r
# summarize results
summary(pscore_hat)$coef
```

```
##              Estimate Std. Error   z value     Pr(>|z|)
## (Intercept)  0.501510  0.2601658  1.927655 5.389802e-02
## x1           1.516792  0.3771295  4.021939 5.772095e-05
## x2          -1.261149  0.3139071 -4.017588 5.879687e-05
```

```r
# Get estimated propensity scores
ps_hat <- predict(pscore_hat, type = "response")
```
---
# Logit, in practice

```r
# density plot the estimate pscore for the two groups, using ggplot
library(ggplot2)
df_ps <- data.frame(d = d, ps_hat = ps_hat)
p1 <- ggplot(df_ps, aes(x = ps_hat, fill = factor(d))) + 
  geom_density(alpha = 0.5) + 
  theme_minimal() +
  labs(title = "Estimated Propensity Score by Treatment Status",
       x = "Estimated Propensity Score",
       y = "Density") +
  scale_fill_discrete(name = "Treatment Status",
                      labels = c("Control", "Treated")) +
 xlim(0, 1)
```
---
# Logit, in practice

```r
# Plot
p1
```

<img src="15slides_files/figure-html/glm4-1.png" style="display: block; margin: auto;" />
---
# Logit, in practice

- I encourage you to look at `cobalt` package for some additional, cool tools.

- Including love plots, which are a graphical representation of the balance of covariates between the treatment and control groups.
---
# Going back to IPW

- Now that we know how to estimate `$p(X)$`, we are game: plug-in principle!

- The IPW estimator with unknown `$p(X)$` for the ATE is given by:
`$$\begin{eqnarray*}
  \widehat{ATE}_{ipw} &=&  \dfrac{E_n\left[\dfrac{D}{\widehat{p}(X)}Y\right]}{E_n\left[\dfrac{D}{\widehat{p}(X)}\right]} - \dfrac{E_n\left[\dfrac{1 - D}{1 - \widehat{p}(X)} Y\right]}{E_n\left[\dfrac{1-D}{1-\widehat{p}(X)}\right]} \\
  &=& E_n\left[\widehat{w}_{1} (D,\widehat{p}(X)) ~Y\right] - E_n\left[\widehat{w}_{0} (D,\widehat{p}(X))  ~Y\right],
  \end{eqnarray*}$$`
where `$\widehat{p}(X_i)$` is the estimated propensity score and <div style="margin-top: -1.5cm;"></div>
`$$\begin{eqnarray*}
  \widehat{w}_{1} (D,p(X)) &=&  \dfrac{\dfrac{D}{p(X)}}{E_n\left[\dfrac{D}{p(X)}\right]}, ~~~~\text{and}~~~
  \widehat{w}_{0} (D,p(X)) &=& \dfrac{\dfrac{1 - D}{1 - p(X)}}{E_n\left[\dfrac{1-D}{1-p(X)}\right]}.
  \end{eqnarray*}$$`
---
# Going back to IPW

- The IPW estimator with unknown `$p(X)$` for the ATT is given by:
`$$\begin{eqnarray*}
  \widehat{ATT}_{ipw} &=&  E_n\left[\dfrac{D_i}{E_n[D]}Y_i\right] - \dfrac{E_n\left[\dfrac{(1 - D)\widehat{p}(X)}{1 - \widehat{p}(X)} Y\right]}{E_n\left[\dfrac{(1 - D)\widehat{p}(X)}{1-\widehat{p}(X)}\right]}\\
  &=& E_n\left[\widehat{w}_1^{att}(D) ~Y\right] - E_n\left[\widehat{w}_{0}^{att} (D,\widehat{p}(X))  ~Y\right],
  \end{eqnarray*}$$`
 where `$\widehat{p}(X_i)$` is the estimated propensity score and 
  `$$\begin{eqnarray*}
  \widehat{w}_1^{att}(D) &=& \dfrac{D}{E_n[D]}, ~~~~\text{and}~~~ \widehat{w}_0^{att}(D,p(X)) = \dfrac{\dfrac{(1-D) p(X)}{(1-p(X))}}{E_n\bigg[\dfrac{(1-D) p(X)}{(1-p(X))}\bigg]}.
  \end{eqnarray*}$$`
---
# IPW, in practice

- In practice, we can do things by hand to see how they work:

```r
# simulate some data
n = 10000
x1 <- rnorm(n)
x2 <- rnorm(n)
# generate true propensity score
ps <- plogis(.5 + x1 - x2)
# generate treatment indicator
d <- (runif(n)<= ps)
# generate potential outcomes
y1 <- 1 + x1 - x2 + rnorm(n )
y0 <- x1 - x2 + rnorm(n)
# generate observed outcomes
y <- d*y1 + (1 - d)*y0

# put data in a data frame
df <- data.frame(y = y, d = d, x1 = x1, x2 = x2)
```
---
# IPW, in practice
.medium-code[

```r
# Estimating propensity score
pscore_hat <-  glm(d ~ x1 + x2, family = binomial(link = "logit"), data = df)
ps_hat <- predict(pscore_hat, type = "response")
# Estimate ATE
# Weights for treated units
w1 <- df$d/ps_hat
w1_n <- w1/mean(w1)

w0 <- (1 - df$d)/(1 - ps_hat)
w0_n <- w0/mean(w0)

# ATE
ATE_ipw <- mean(w1_n*df$y) - mean(w0_n*df$y)
ATE_ipw
```

```
## [1] 0.9908317
```

```r
# HT-type IPW estimator
ATE_ht <- mean(w1*df$y) - mean(w0*df$y)
ATE_ht
```

```
## [1] 0.9898402
```
]
---
# IPW, in practice

- Can you find an R package that implements all this for you?

- You guessed it, `targeted` package!

- One thing to be aware is that the `targeted` does not match 1-1 what we did (because they use different estimation methods). We will cover other packages to avoid this mismatch.

- Let's see how it works, though:

```r
# Load the package
library(targeted)
ate_ipw <- targeted::ate(y ~ d |  1 | x1 + x2, 
    data=df, binary=FALSE)
```
---

# IPW, in practice
<small>

```r
summary(ate_ipw)
```

```
## 
## Augmented Inverse Probability Weighting estimator
##   Response y (Outcome model: gaussian):
## 	 y ~ 1 
##   Exposure d (Propensity model: logistic regression):
## 	 d ~ x1 + x2 
## 
##                   Estimate Std.Err    2.5%   97.5%    P-value
##  d=FALSE          -0.01631 0.07057 -0.1546  0.1220  8.173e-01
##  d=TRUE            0.97592 0.02973  0.9176  1.0342 2.734e-236
## Outcome model:                                               
##                    0.57191 0.02011  0.5325  0.6113 5.938e-178
## Propensity model:                                            
##  (Intercept)      -0.50945 0.02446 -0.5574 -0.4615  2.484e-96
##  x1               -1.00041 0.02781 -1.0549 -0.9459 2.212e-283
##  x2                1.00209 0.02835  0.9465  1.0577 1.155e-273
## 
## Average Treatment Effect (constrast: 'd=TRUE' - 'd=FALSE'):
## 
##     Estimate Std.Err   2.5% 97.5%   P-value
## ATE   0.9922 0.07097 0.8531 1.131 2.019e-44
```
</small>
---
# IPW, in practice, matching our slides!

- We have discussed that `targeted` package does not match 1-1 what we did (because they use different estimation methods).

- That leads to the question: which package avoids this?

- The answer is `WeightIt` package!

- It also does a lot more sanity checks for us!
---
# IPW, in practice

```r
# Load the package
library(WeightIt)
# Estimate the pscore
pscore_w <- weightit(d ~ x1 + x2,
                     data = df, 
                     method = "glm", 
                     estimand = "ATE",
                     stabilize = TRUE)
```
---
# IPW, in practice: some sanity checks
.small-code[
  
  ```r
  summary(pscore_w)
  ```
  
  ```
  ##                  Summary of weights
  ## 
  ## - Weight ranges:
  ## 
  ##            Min                                   Max
  ## treated 0.5898 |-------|                     16.4479
  ## control 0.4126 |---------------------------| 56.3177
  ## 
  ## - Units with the 5 most extreme weights by group:
  ##                                                 
  ##               92    5119    5293    1691    8093
  ##  treated 10.9703 11.7149 11.8063 13.8199 16.4479
  ##             7173    6026    8090    8089    6320
  ##  control 12.7311 12.8511 13.5943 19.7831 56.3177
  ## 
  ## - Weight statistics:
  ## 
  ##         Coef of Var   MAD Entropy # Zeros
  ## treated       0.739 0.372   0.150       0
  ## control       1.295 0.539   0.301       0
  ## 
  ## - Effective Sample Sizes:
  ## 
  ##            Control Treated
  ## Unweighted 4112.   5888.  
  ## Weighted   1536.66 3809.28
  ```
  ]
---
# IPW, in practice, with normalized weights
.small-code[
  
  ```r
  # Now, let's actually get the ATE
  ate_ipw_w <- lm_weightit(y ~ d, data = df, weightit = pscore_w, vcov = "asympt")
  summary(ate_ipw_w)
  ```
  
  ```
  ## 
  ## Call:
  ## lm_weightit(formula = y ~ d, data = df, weightit = pscore_w, 
  ##     vcov = "asympt")
  ## 
  ## Coefficients:
  ##             Estimate Std. Error z value Pr(>|z|)    
  ## (Intercept) -0.01449    0.07828  -0.185    0.853    
  ## dTRUE        0.99083    0.07814  12.681   <2e-16 ***
  ## Standard error: HC0 robust (adjusted for estimation of weights)
  ```
  
  ```r
  # Compare it with Normalized ATE by hand
  ATE_ipw
  ```
  
  ```
  ## [1] 0.9908317
  ```
  
  ```r
  # Compate with the one from targeted package
  summary(ate_ipw)$asso
  ```
  
  ```
  ##     Estimate Std.Err   2.5% 97.5%   P-value
  ## ATE   0.9922 0.07097 0.8531 1.131 2.019e-44
  ```
  ]
---
# IPW, in practice: Balance checks
.small-code[
  
  ```r
  # Balance checks
   cobalt::love.plot(pscore_w, binary = "std", thresholds = c(m = .1))
  ```
  
  <img src="15slides_files/figure-html/weightit8-1.png" style="display: block; margin: auto;" />
  ]
---
# IPW, in practice: Overlap
.small-code[
  
  ```r
  bal.plot(pscore_w, which = "both")+
  labs(title="Propensity Score Overlap") +
  xlim(c(0,1)) +
  labs(x = "Propensity Score", y = "Density")  
  ```
  
  <img src="15slides_files/figure-html/weightit9-1.png" style="display: block; margin: auto;" />
  ]