Lecture 12

class: title-slide

# Econ 520: Data Science for Economists

## Lecture 12: Linear Regression Refresher
<br>

<p align=center>
Pedro H. C. Sant'Anna
</p>
<div style="margin-top: -.7cm;"></div>
<p align=center>
Emory University
</p>
<br>
<p align=center>
Spring 2024
</p>

---
class: center, middle
name: prologue

# Lecture Structure

---
# Main Goals

- The main goal of this lecture is to provide a quick refresher of linear regression analysis related to what you have seen in .hi[Econ 320]

- We will cover the basics of linear regression analysis in R

- We will cover estimation, and heteroskesdastic-robust inference.

- We will also cover marginal effects and other post-estimation tools.

---
# R packages

- We will use several R packages today.

- New: **estimatr**, **lmtest**, **broom**

- I'll try to be explicit about where a particular function is coming from, whenever we use it below.

- A convenient way to install (if necessary) and load everything is by running the below code chunk.

```r
## Load and install the packages that we'll be using today
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, estimatr, sandwich, lmtest, margins, broom, modelsummary)

# Plot theme
theme_set(hrbrthemes::theme_ipsum())
```

---
# Example: starwars data

- I want to mention up front is that we'll mostly be working with the `starwars` data frame that we've already seen from previous lectures.

- This is not a big deal because I just want to refresh your memory about some regression tools.

- Here's a quick reminder of what `starwars` data frame looks like to refresh your memory.

```r
starwars
```

```
## # A tibble: 87 × 14
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
##  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
##  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
##  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
## 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
## # ℹ 77 more rows
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
```
---
# Regression Basics

- R's workhorse command for running linear regression models is the built-in `lm()` function.

- The "**lm**" stands for "**l**inear **m**odels" and the syntax is very intuitive.

```r
lm(y ~ x1 + x2 + x3 + ..., data = df)
```

- You'll note that the `lm()` call includes a reference to the data source (in this case, a hypothetical data frame called `df`).

- All the variables `y`, `x1`, `x2`, `x3`, etc. are assumed to be columns in `df`.

---
# Example: OLS

- Let's run a simple bivariate regression of mass on height using our dataset of starwars characters.

```r
ols1 = lm(mass ~ height, data = starwars)
ols1
```

```
## 
## Call:
## lm(formula = mass ~ height, data = starwars)
## 
## Coefficients:
## (Intercept)       height  
##     -11.487        0.624
```
---
# Example: OLS (continued)
- The resulting object is pretty terse, but that's only because it buries most of its valuable information --- of which there is a lot --- within its internal list structure.

- If you're in RStudio, you can inspect this structure by typing `View(ols1)` or simply clicking on the "ols1" object in your environment pane.

- Doing so will prompt an interactive panel to pop up for you to play around with.

- That approach won't work for this knitted R Markdown document, however, so I'll use the `listviewer::jsonedit()` function instead.
---
# Example: OLS (continued)

```r
# View(ols1) ## Run this instead if you're in a live session
listviewer::jsonedit(ols1, mode="view", height = "400px") ## Better for R Markdown
```

<div id="htmlwidget-354f46bb2f7e9769a745" style="width:100%;height:400px;" class="jsonedit html-widget "></div>
<script type="application/json" data-for="htmlwidget-354f46bb2f7e9769a745">{"x":{"data":{"coefficients":{"(Intercept)":-11.48681569256502,"height":0.6240033173774021},"residuals":{"1":-18.84175489634823,"2":-17.72173830946114,"3":-16.4175027756656,"4":21.43814558232978,"5":-33.11368191404531,"6":20.41422519938743,"7":-16.47373167470634,"8":-17.041506093043,"9":-18.70579138749958,"10":-25.08178807012217,"11":-21.82580797438659,"13":-18.78594066948267,"14":-20.83378143536737,"15":-22.46575821372555,"16":1260.28623515152,"17":-17.59374826159335,"18":9.166218564632629,"19":-12.69740325434353,"20":-19.59374826159335,"21":-24.50579138749957,"22":26.68615221708458,"23":5.926185390858608,"24":-19.96177148323516,"25":-18.71376484848036,"26":-17.83378143536737,"29":-23.42547623664638,"30":-20.35371508781933,"31":-19.9458245612736,"32":-17.69781792651879,"34":-58.95379802225438,"35":-44.81783451340581,"36":-46.28992739997306,"40":-18.40155585370403,"43":-17.71376484848036,"45":-44.58577480061257,"46":-22.80944638024976,"47":-2.169496140910794,"49":-25.22572503995153,"50":-21.82580797438659,"51":-30.0658411481606,"52":-23.8178345134058,"54":-53.32979470487698,"57":-25.82580797438659,"59":-18.95379802225438,"62":-22.70579138749958,"63":-38.39374826159335,"64":-42.09773499208374,"66":-28.94582456127359,"68":-23.70579138749958,"69":-38.34574162683855,"70":-10.06584114816061,"71":-43.40994398686007,"75":-60.9458245612736,"77":-42.58577480061257,"78":35.70209913904615,"79":1.470039426252913,"80":-26.82580797438659,"81":-51.58577480061257,"82":-37.05786768717982},"effects":{"(Intercept)":-747.4666135053021,"height":168.8796996416187,"3":-9.874685409083842,"4":22.04542433286435,"5":-29.59462950280835,"6":22.3653994856308,"7":-13.79461397328737,"8":-10.55468437378245,"9":-17.03459533786221,"10":-23.35459637316361,"11":-20.43459016135522,"12":-19.63454874929929,"13":-18.9945984437664,"14":-20.23460569087619,"15":1262.405396379727,"16":-15.19460879678038,"17":11.0054015562336,"18":-4.474716468125782,"19":-17.19460879678038,"20":-22.8345953378622,"21":27.40542226226156,"22":7.205411909247579,"23":-17.9546015496706,"24":-16.59460362027339,"25":-15.9945984437664,"26":-16.43469369149502,"27":-17.39461914979436,"28":-18.83458498484823,"29":-16.47458705545102,"30":-57.39459326725942,"31":-43.87458187894403,"32":-46.91455289050489,"33":-12.75466884426148,"34":-15.59460362027339,"35":-42.6346005143692,"36":-15.31470300920761,"37":4.485312520313361,"38":-22.43461604389017,"39":-20.43459016135522,"40":-29.23457980834123,"41":-22.87458187894403,"42":-51.71459430256081,"43":-24.43459016135522,"44":-17.39459326725941,"45":-21.03459533786221,"46":-35.99460879678038,"47":-39.47461293798598,"48":-27.83458498484822,"49":-22.03459533786221,"50":-35.83461086738318,"51":-9.234579808341236,"52":-44.3145477139979,"53":-59.83458498484823,"54":-40.6346005143692,"55":35.52543882708393,"56":0.2854574625090916,"57":-25.43459016135522,"58":-49.6346005143692,"59":-36.67457152593005},"rank":2,"fitted.values":{"1":95.84175489634823,"2":92.72173830946114,"3":48.4175027756656,"4":114.5618544176702,"5":82.11368191404532,"6":99.58577480061257,"7":91.47373167470634,"8":49.041506093043,"9":102.7057913874996,"10":102.0817880701222,"11":105.8258079743866,"13":130.7859406694827,"14":100.8337814353674,"15":96.46575821372555,"16":97.7137648484802,"17":94.59374826159335,"18":100.8337814353674,"19":29.69740325434353,"20":94.59374826159335,"21":102.7057913874996,"22":113.3138477829154,"23":107.0738146091414,"24":98.96177148323517,"25":97.71376484848037,"26":100.8337814353674,"29":43.42547623664638,"30":88.35371508781932,"31":108.9458245612736,"32":107.6978179265188,"34":103.9537980222544,"35":110.8178345134058,"36":128.2899273999731,"40":58.40155585370403,"43":97.71376484848037,"45":99.58577480061257,"46":37.80944638024975,"47":47.16949614091079,"49":90.22572503995153,"50":105.8258079743866,"51":112.0658411481606,"52":110.8178345134058,"54":103.329794704877,"57":105.8258079743866,"59":103.9537980222544,"62":102.7057913874996,"63":94.59374826159336,"64":92.09773499208374,"66":108.9458245612736,"68":102.7057913874996,"69":93.34574162683856,"70":112.0658411481606,"71":131.4099439868601,"75":108.9458245612736,"77":99.58577480061257,"78":123.2979008609538,"79":134.5299605737471,"80":105.8258079743866,"81":99.58577480061257,"82":117.0578676871798},"assign":[0,1],"qr":{"qr":[[-7.681145747868608,-1339.253327259735],[0.1301889109808239,270.6391054960994],[0.1301889109808239,0.2885190856728893],[0.1301889109808239,-0.1031464122073782],[0.1301889109808239,0.08899137920558323],[0.1301889109808239,-0.01446743155524215],[0.1301889109808239,0.0335670162979982],[0.1301889109808239,0.2848241281457169],[0.1301889109808239,-0.03294221919110383],[0.1301889109808239,-0.02924726166393149],[0.1301889109808239,-0.0514170068269655],[0.1301889109808239,-0.1992153079138589],[0.1301889109808239,-0.02185734660958683],[0.1301889109808239,0.00400735608061952],[0.1301889109808239,-0.00338255897372515],[0.1301889109808239,0.01509222866213653],[0.1301889109808239,-0.02185734660958683],[0.1301889109808239,0.3993678114880593],[0.1301889109808239,0.01509222866213653],[0.1301889109808239,-0.03294221919110383],[0.1301889109808239,-0.09575649715303353],[0.1301889109808239,-0.05880692188131018],[0.1301889109808239,-0.01077247402806982],[0.1301889109808239,-0.00338255897372515],[0.1301889109808239,-0.02185734660958683],[0.1301889109808239,0.318078745890268],[0.1301889109808239,0.05204180393385988],[0.1301889109808239,-0.06989179446282717],[0.1301889109808239,-0.06250187940848251],[0.1301889109808239,-0.0403321342454485],[0.1301889109808239,-0.08097666704434418],[0.1301889109808239,-0.1844354778051696],[0.1301889109808239,0.2293997652381319],[0.1301889109808239,-0.00338255897372515],[0.1301889109808239,-0.01446743155524215],[0.1301889109808239,0.351333363634819],[0.1301889109808239,0.295909000727234],[0.1301889109808239,0.04095693135234287],[0.1301889109808239,-0.0514170068269655],[0.1301889109808239,-0.08836658209868885],[0.1301889109808239,-0.08097666704434418],[0.1301889109808239,-0.03663717671827616],[0.1301889109808239,-0.0514170068269655],[0.1301889109808239,-0.0403321342454485],[0.1301889109808239,-0.03294221919110383],[0.1301889109808239,0.01509222866213653],[0.1301889109808239,0.02987205877082586],[0.1301889109808239,-0.06989179446282717],[0.1301889109808239,-0.03294221919110383],[0.1301889109808239,0.02248214371648119],[0.1301889109808239,-0.08836658209868885],[0.1301889109808239,-0.2029102654410312],[0.1301889109808239,-0.06989179446282717],[0.1301889109808239,-0.01446743155524215],[0.1301889109808239,-0.1548758175877909],[0.1301889109808239,-0.2213850530768929],[0.1301889109808239,-0.0514170068269655],[0.1301889109808239,-0.01446743155524215],[0.1301889109808239,-0.1179262423160675]],"qraux":[1.130188910980824,1.026177101243654],"pivot":[1,2],"tol":1e-07,"rank":2},"df.residual":57,"na.action":{},"xlevels":{},"call":{},"terms":{},"model":{"mass":[77,75,32,136,49,120,75,32,84,77,84,112,80,74,1358,77,110,17,75,78.2,140,113,79,79,83,20,68,89,90,45,66,82,40,80,55,15,45,65,84,82,87,50,80,85,80,56.2,50,80,79,55,102,88,48,57,159,136,79,48,80],"height":[172,167,96,202,150,178,165,97,183,182,188,228,180,173,175,170,180,66,170,183,200,190,177,175,180,88,160,193,191,185,196,224,112,175,178,79,94,163,188,198,196,184,188,185,183,170,166,193,183,168,198,229,193,178,216,234,188,178,206]}},"options":{"mode":"view","modes":["text","tree","table"]}},"evals":[],"jsHooks":[]}</script>
---
# Example: OLS (continued)
- As we can see, this `ols1` object has a bunch of important slots
  - regression coefficients, 
  - vectors of the residuals and fitted (i.e. predicted) values, 
  - rank of the design matrix,
  - to the input data, 
  - etc. etc. 
---
# Example: OLS (continued)
- To summarize the key pieces of information, we can use the `summary()` function.

```r
summary(ols1)
```

```
## 
## Call:
## lm(formula = mass ~ height, data = starwars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -60.95  -29.51  -20.83  -17.65 1260.29 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.4868   111.3842  -0.103    0.918
## height        0.6240     0.6262   0.997    0.323
## 
## Residual standard error: 169.5 on 57 degrees of freedom
##   (28 observations deleted due to missingness)
## Multiple R-squared:  0.01712,	Adjusted R-squared:  -0.0001194 
## F-statistic: 0.9931 on 1 and 57 DF,  p-value: 0.3232
```
---
# Example: OLS (continued)
We can then dig down further by extracting a summary of the regression coefficients:
  
  
  ```r
  summary(ols1)$coefficients
  ```
  
  ```
  ##                Estimate  Std. Error    t value  Pr(>|t|)
  ## (Intercept) -11.4868157 111.3841576 -0.1031279 0.9182234
  ## height        0.6240033   0.6261744  0.9965328 0.3232031
  ```
---
# Get "tidy" OLS coefficients with the `broom` package

- While it's easy to extract regression coefficients via the `summary()` function, in practice I always use the **broom** package ([link ](https://broom.tidyverse.org/)) to do so.

- **broom** has a bunch of neat features to convert regression (and other statistical) objects into "tidy" data frames.

- This is especially useful because regression output is so often used as an input to something else, e.g., a plot of coefficients or marginal effects.
---

# Get "tidy" OLS coefficients with the `broom` package (continued)

- Here, I'll use `broom::tidy(..., conf.int = TRUE)` to coerce the `ols1` OLS object into a tidy data frame of coefficient values and key statistics.

```r
# library(broom) ## Already loaded

tidy(ols1, conf.int = TRUE)
```

```
## # A tibble: 2 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)  -11.5     111.       -0.103   0.918 -235.       212.  
## 2 height         0.624     0.626     0.997   0.323   -0.630      1.88
```
---
# Get "tidy" OLS coefficients with the `broom` package (continued)

- Again, I could now pipe this tidied coefficients data frame to a **ggplot2** call, using saying `geom_pointrange()` to plot the error bars.

- We'll get to some explicit examples further below.
---
# Regressing on subsetted data
  
- Different species and homeworlds aside, we may have an extreme outlier in our data
<img src="12slides_files/figure-html/jabba-1.png" style="display: block; margin: auto;" />
---
# How to handle this?

- Maybe we should exclude Jabba from our regression?

- You can do this in two ways:

1) Create a new data frame and then regress, or
  
  2) Subset the original data frame directly in the `lm()` call.

---
# Create a new data frame style

- Recall that we can keep multiple objects in memory in R.

- So we can easily create a new data frame that excludes Jabba using, say, **dplyr**

```r
starwars2 =
  starwars %>% 
  filter(name != "Jabba Desilijic Tiure")
  # filter(!(grepl("Jabba", name))) ## Regular expressions also work

ols2 = lm(mass ~ height, data = starwars2)
summary(ols2)
```

```
## 
## Call:
## lm(formula = mass ~ height, data = starwars2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.006  -7.804   0.508   4.007  57.901 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -31.25047   12.81488  -2.439   0.0179 *  
## height        0.61273    0.07202   8.508 1.14e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.49 on 56 degrees of freedom
##   (28 observations deleted due to missingness)
## Multiple R-squared:  0.5638,	Adjusted R-squared:  0.556 
## F-statistic: 72.38 on 1 and 56 DF,  p-value: 1.138e-11
```
---
# Subset directly in the `lm()` call

Our other alternative is to run direct from `lm()`.

```r
ols2a = lm(mass ~ height, data = starwars %>% filter(!(grepl("Jabba", name))))
summary(ols2a)
```

```
## 
## Call:
## lm(formula = mass ~ height, data = starwars %>% filter(!(grepl("Jabba", 
##     name))))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.006  -7.804   0.508   4.007  57.901 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -31.25047   12.81488  -2.439   0.0179 *  
## height        0.61273    0.07202   8.508 1.14e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.49 on 56 degrees of freedom
##   (28 observations deleted due to missingness)
## Multiple R-squared:  0.5638,	Adjusted R-squared:  0.556 
## F-statistic: 72.38 on 1 and 56 DF,  p-value: 1.138e-11
```
---
# Issue with `lm` is inference

- The `lm` function is great for estimating the coefficients of a linear model, but it's not so great for inference.

- The `lm` function assumes that the variance of the residuals is constant across all levels of the covariates.

- This is known as homoskedasticity assumption.

- I strongly recommend you to not rely on that as this assumption is .hi[VERY] often violated.

- How can we fix that?
---
# Hetereoskedastic-robust standard errors

- One way to fix this is to use heteroskedastic-robust standard errors.

- This is done by using the `coeftest` function from the `lmtest` package.

```r
# library(lmtest) already loaded
lmtest::coeftest(ols2a, vcov = vcovHC(ols2a, type = "HC2")) %>%
  tidy(conf.int = TRUE)
```

```
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)  -31.3      9.23       -3.38 1.31e- 3  -49.7     -12.8  
## 2 height         0.613    0.0608     10.1  3.54e-14    0.491     0.735
```
---
# `estimatr::lm_robust()` as a one-stop alternative

- Instead of using `lm` and then `coeftest`, you can use the `estimatr` package to do both at once.

-  Let's illustrate by implementing a robust version of the `ols2a` OLS that we ran

- Note that **estimatr** models automatically print in pleasing tidied/summary format, although you can certainly pipe them to `tidy()` too.

```r
# library(estimatr) ## Already loaded

ols1_robust = lm_robust(mass ~ height, data = starwars %>% filter(!(grepl("Jabba", name))))
# tidy(ols1_robust, conf.int = TRUE) ## Could tidy too
ols1_robust
```

```
##                Estimate Std. Error   t value     Pr(>|t|)    CI Lower
## (Intercept) -31.2504692 9.23314438 -3.384597 1.307917e-03 -49.7466800
## height        0.6127301 0.06084511 10.070327 3.544723e-14   0.4908427
##                CI Upper DF
## (Intercept) -12.7542584 56
## height        0.7346175 56
```
---
class: center, middle
name: Interactions

# Dummy variables and interactions

---
# Focusing on humans
- For the next few sections, it will prove convenient to demonstrate using a subsample of the starwars data that comprises only the human characters.

- Let's quickly create this new dataset before continuing.

```r
humans = 
  starwars %>% 
  filter(species=="Human") %>%
  select(where(Negate(is.list))) ## Drop list columns (optional)
humans
```

```
## # A tibble: 35 × 11
##    name     height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
##  2 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
##  3 Leia Or…    150    49 brown      light      brown           19   fema… femin…
##  4 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
##  5 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
##  6 Biggs D…    183    84 black      light      brown           24   male  mascu…
##  7 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
##  8 Anakin …    188    84 blond      fair       blue            41.9 male  mascu…
##  9 Wilhuff…    180    NA auburn, g… fair       blue            64   male  mascu…
## 10 Han Solo    180    80 brown      fair       brown           29   male  mascu…
## # ℹ 25 more rows
## # ℹ 2 more variables: homeworld <chr>, species <chr>
```
---
# Dummy variables

- Dummy variables are a core component of many regression models.

- However, these can be a pain to create in some statistical languages, since you first have to tabulate a whole new matrix of binary variables and then append it to the original data frame.

- In contrast, R has a very convenient framework for creating and evaluating dummy variables in a regression: Simply specify the variable of interest as a [factor](https://r4ds.had.co.nz/factors.html).

---
# Dummy variables (cont'd)

- Here's an example where we explicitly tell R that "gender" is a factor. Since I don't plan on reusing this model, I'm just going to print the results to screen rather than saving it to my global environment.
  
  
  ```r
  summary(estimatr::lm_robust(mass ~ height + as.factor(gender), data = humans))$coefficients
  ```
  
  ```
  ##                               Estimate Std. Error    t value  Pr(>|t|)
  ## (Intercept)                -39.5021260 93.3927675 -0.4229677 0.6776209
  ## height                       0.5750128  0.5869553  0.9796534 0.3409941
  ## as.factor(gender)masculine  20.1953844 18.4225031  1.0962346 0.2882632
  ##                                CI Lower  CI Upper DF
  ## (Intercept)                -236.5436417 157.53939 17
  ## height                       -0.6633547   1.81338 17
  ## as.factor(gender)masculine  -18.6726996  59.06347 17
  ```
  
- Okay, I should tell you that I'm actually making things more complicated than they need to be with the heavy-handed emphasis on factors. 
  
- A case in point is that we don't actually *need* to specify a string (i.e. character) variable as a factor in a regression. R will automatically do this for you regardless, as long as the variable is not a number.
---
# Dummy variables (cont'd)

```r
## Use the non-factored version of "gender" instead; R knows it must be ordered
## for it to be included as a regression variable
summary(estimatr::lm_robust(mass ~ height + gender, data = humans))$coefficients
```

```
##                    Estimate Std. Error    t value  Pr(>|t|)     CI Lower
## (Intercept)     -39.5021260 93.3927675 -0.4229677 0.6776209 -236.5436417
## height            0.5750128  0.5869553  0.9796534 0.3409941   -0.6633547
## gendermasculine  20.1953844 18.4225031  1.0962346 0.2882632  -18.6726996
##                  CI Upper DF
## (Intercept)     157.53939 17
## height            1.81338 17
## gendermasculine  59.06347 17
```
---
# Interactions
- Like dummy variables, R provides a convenient syntax for specifying interaction terms directly in the regression model without having to create them manually.

- You can use any of the following expansion operators:

- `x1:x2` "crosses" the variables (equiv. to including only the x1 × x2 interaction term)
- `x1/x2` "nests" the second variable within the first (equivalent to `x1 + x1:x2`)
- `x1*x2` includes all parent and interaction terms (equivalent to `x1 + x2 + x1:x2`)

- As a rule of thumb, it is generally advisable to include all of the parent terms alongside their interactions.

- This makes the `*` option a good default.

---
# Interactions: Example

We might wonder whether the relationship between a person's body mass and their height is modulated by their gender. That is, we want to run a regression of the form,
  
  `$$Mass = \beta_0 + \beta_1 D_{Male} + \beta_2 Height + \beta_3 D_{Male} \times Height$$`
    
To implement this in R, we simply run the following,
  
  
  ```r
  ols_ie = estimatr::lm_robust(mass ~ gender * height, data = humans)
  summary(ols_ie)$coefficients
  ```
  
  ```
  ##                            Estimate  Std. Error    t value  Pr(>|t|)
  ## (Intercept)              87.8648649 151.9183924  0.5783688 0.5710671
  ## gendermasculine        -177.2710448 193.7338992 -0.9150234 0.3737641
  ## height                   -0.1891892   0.9081564 -0.2083223 0.8376061
  ## gendermasculine:height    1.1479992   1.1278747  1.0178429 0.3238976
  ##                           CI Lower   CI Upper DF
  ## (Intercept)            -234.187740 409.917470 16
  ## gendermasculine        -587.968564 233.426475 16
  ## height                   -2.114395   1.736016 16
  ## gendermasculine:height   -1.242988   3.538987 16
  ```
.hi[How do you interpret the coefficients?]