# Causality II

Data Analytics and Visualization with R
Session 5

Viktoriia Semenova

University of Mannheim
Spring 2023

# Intro

## Housekeeping

• Decide on your teammates by Monday: DM me on Slack/email
• Deadline for Problem Set 4: today, 23:59
• Deadline for Problem Set 5: Tuesday March 21, 23:59

## Quiz: Which of these statements are correct?

04:00

For the relationship between Beauty and Talent, being a Movie star is:

1. a confounder and thus should be accounted for.
2. a mediator and thus should not be accounted for.
3. a collider and here we accounted for it when fitting the regression line.
4. a collider and accounting for it masks the true relationship between Beauty and Talent.

## Substantive Effect of X on Y

Extreme values of correlation coefficient (i.e. close to -1 or 1) imply that there is a large substantive effect of $X$ on $Y$

cov(x, y)
 2.636158
cor(x, y)
 0.9968431
cor(x, y/100) # change the scale of happiness index
 0.9968431
lm(y ~ x) %>% coef()
(Intercept)           x
20.0270384   0.4971041 
lm(y/100 ~ x) %>% coef() # change the scale of happiness index
(Intercept)           x
0.200270384 0.004971041 
• Slope coefficient value incorporates the information about (1) correlation and (2) relative scales of the variables
• Strong correlation implies larger in magnitude slope coefficient
• But does this mean that the effect is substantively large?

## Substantive Effect of X on Y

Extreme values of correlation coefficient (i.e. close to -1 or 1) imply that there is a large substantive effect of $X$ on $Y$

To say that the effect is substantively large, we need substantive information about the scales of these variables:

• what are the plausible values of $Y$ and $X$?
• is a change of 0.497 in $Y$ a practically, substantively large one?

## Agenda for Today

Paths and Backdoors: DAGs and Causal Identification

Lab: Drawing DAGs in dagitty.net and ggdag

# Causal Diagrams, Paths, and Backdoors

## Why Care About Causal Diagrams?

DAGs represent the underlying data-generating process

• help clarify study question and relevant concepts
• provide common language to talk about theories and causal relationship (systematic way to talk about what is missing, like a node or a path)
• make our assumptions about DGP explicit
• help determine whether the effect of interest can be identified from available data
• allow us to determine which variables we need to account for to be able to estimate the causal effect (isolate specific pathways)

## Steps to Causal Diagram

1. Identify your treatment $X$ and outcome $Y$ variables
2. List possible variables (Nodes) related to the relationship you try to identify, including the unobserved and unmeasurable ones
3. For simplicity, combine them together or prune the ones least likely to be important
4. Consider which variables are likely to affect which other variables and draw arrows from one to the other
5. List all paths that connect $X$ to $Y$, regardless of the direction of arrows
6. Identify any pathways that have arrows pointing backwards towards $X$
7. Control for all nodes that point back to $X$ (aka Close Backdoors)

## Paths Glossary

Frontdoor Path

A path where all the arrows point away from Treatment $X$

Backdoor Path

A path where at least one of the arrows points towards Treatment $X$

Open Path

A path in which there is variation in all variables along the path (and no variation in any colliders on that path)

Closed Path

A path in which there is at least one variable with no variation (or a collider with variation)

Our goal: block all backdoor paths to identify the main pathway we care about

## Finding Paths

• $X$ causes $Y$
• $Z$ causes both $X$ and $Y$
• $Z$ confounds the $X⟶Y$ association
• Paths between $X$ and $Y$:
• $X⟶Y$
• $X⟵Z⟶Y$
• $Z$ is a backdoor path
• Even if there was no $X⟶Y$, $Z$ connects them

## Finding Paths: Campaign Money Example

2. Money ⟵ Candidate Quality ⟶ Total Votes
• Accounting for Quality closes the backdoor
• In other words, we:
• compare candidates as if they had the same Quality
• remove differences that are predicted by Quality
• hold Quality constant

## Finding Paths: A More Complex DAG

List all paths that connect Money raised with Total Votes (regardless of the direction of arrows). Which of them are backdoor paths?

03:00
2. Money ⟶ Hire campaign manager ⟶ Total Votes
3. Money ⟶ Won Election ⟵Total Votes
4. Money ⟵ Candidate Quality ⟶ Total Votes
5. Money ⟵ District ⟶ Total Votes
6. Money ⟵ Party ⟶ Total Votes
7. Money ⟵ District ⟵ History ⟶ Party ⟶ Total Votes
8. Money ⟵ Party ⟵ History ⟶ District ⟶ Total Votes

## Closing Backdoor Paths Is the Goal

03:00

Frontdoor Paths:

2. Money ⟶ Hire campaign manager ⟶ Total Votes

Closed Backdoor Path:

1. Money ⟶ Won Election ⟵Total Votes

Open Backdoor Paths:

1. Money ⟵ Candidate Quality ⟶ Total Votes
2. Money ⟵ District ⟶ Total Votes
3. Money ⟵ Party ⟶ Total Votes
4. Money ⟵ District ⟵ History ⟶ Party ⟶ Total Votes
5. Money ⟵ Party ⟵ History ⟶ District ⟶ Total Votes
• Adjusting for Quality, District, and Party closes open backdoors. ⟶ Yes!
• Unobserved History then also does not confound Money and Votes.
• Adjusting for Won Election opens a backdoor ⟶ No!

## Backdoor Criterion Comes from $do$-calculus

• If we apply a set of logical rules called $do$-calculus, we can strip away confounding relationships and isolate effect of interest in observational data
• $do()$ operator represents a direct intervention in a DAG and means setting a Node to a particular value (e.g., Money Raised = \$10’000), like as if we were doing an experiment
• Estimating $E[\text{Total Votes} | do(\text{Money raised})]$ is impossible though, we can only estimate $E[\text{Total Votes} | \text{Money raised}]$, but $E[\text{Total Votes} | do(\text{Money raised})] \neq E[\text{Total Votes} | \text{Money raised}]$
• Applying the rules of $do-$calculus allows us to remove $do()$ and make the causal effect identifiable (i.e. isolated)

If you can transform $do()$ expressions to $do$-free versions, you can legally make causal inferences from observational data

## Main Take Aways

• Causal diagrams sketch the DGP and allow us to see if the effect of interest is identifiable, i.e. that we can isolate the path(s) that refer to our effect
• There will often be more than one single appropriate DAG
• Backdoor paths create systematic, noncausal correlations between the causal variable of interest and the outcome you are trying to study, and we need to close them
• If you assume your DAG is right and you closed all backdoors, you can legally make causal inferences from observational data 