Causality II

Data Analytics and Visualization with R
Session 5

Viktoriia Semenova

University of Mannheim
Spring 2023

Intro

Housekeeping

  • Decide on your teammates by Monday: DM me on Slack/email
  • Deadline for Problem Set 4: today, 23:59
  • Deadline for Problem Set 5: Tuesday March 21, 23:59

Quiz: Which of these statements are correct?

04:00

For the relationship between Beauty and Talent, being a Movie star is:

  1. a confounder and thus should be accounted for.
  2. a mediator and thus should not be accounted for.
  3. a collider and here we accounted for it when fitting the regression line.
  4. a collider and accounting for it masks the true relationship between Beauty and Talent.

Major Types of Association


Confounding

D a X b Y a->b c Z c->a c->b

Mediation

D a X b Y a->b c Z a->c c->b

Collision

D a X b Y a->b c Z a->c b->c

Collider Bias: Movie Stars

Substantive Effect of X on Y

Extreme values of correlation coefficient (i.e. close to -1 or 1) imply that there is a large substantive effect of \(X\) on \(Y\)

cov(x, y)
[1] 2.636158
cor(x, y)
[1] 0.9968431
cor(x, y/100) # change the scale of happiness index
[1] 0.9968431
lm(y ~ x) %>% coef()
(Intercept)           x 
 20.0270384   0.4971041 
lm(y/100 ~ x) %>% coef() # change the scale of happiness index
(Intercept)           x 
0.200270384 0.004971041 
  • Slope coefficient value incorporates the information about (1) correlation and (2) relative scales of the variables
  • Strong correlation implies larger in magnitude slope coefficient
  • But does this mean that the effect is substantively large?

Substantive Effect of X on Y

Extreme values of correlation coefficient (i.e. close to -1 or 1) imply that there is a large substantive effect of \(X\) on \(Y\)

To say that the effect is substantively large, we need substantive information about the scales of these variables:

  • what are the plausible values of \(Y\) and \(X\)?
  • is a change of 0.497 in \(Y\) a practically, substantively large one?

Agenda for Today


Paths and Backdoors: DAGs and Causal Identification


Lab: Drawing DAGs in dagitty.net and ggdag

Causal Diagrams, Paths, and Backdoors

Why Care About Causal Diagrams?

DAGs represent the underlying data-generating process

  • help clarify study question and relevant concepts
  • provide common language to talk about theories and causal relationship (systematic way to talk about what is missing, like a node or a path)
  • make our assumptions about DGP explicit
  • help determine whether the effect of interest can be identified from available data
  • allow us to determine which variables we need to account for to be able to estimate the causal effect (isolate specific pathways)

Steps to Causal Diagram

  1. Identify your treatment \(X\) and outcome \(Y\) variables
  2. List possible variables (Nodes) related to the relationship you try to identify, including the unobserved and unmeasurable ones
  3. For simplicity, combine them together or prune the ones least likely to be important
  4. Consider which variables are likely to affect which other variables and draw arrows from one to the other
  5. List all paths that connect \(X\) to \(Y\), regardless of the direction of arrows
  6. Identify any pathways that have arrows pointing backwards towards \(X\)
  7. Control for all nodes that point back to \(X\) (aka Close Backdoors)

Paths Glossary

Frontdoor Path

A path where all the arrows point away from Treatment \(X\)

Backdoor Path

A path where at least one of the arrows points towards Treatment \(X\)

Open Path

A path in which there is variation in all variables along the path (and no variation in any colliders on that path)

Closed Path

A path in which there is at least one variable with no variation (or a collider with variation)

Our goal: block all backdoor paths to identify the main pathway we care about

Finding Paths

D a X b Y a->b c Z c->a c->b

  • \(X\) causes \(Y\)
  • \(Z\) causes both \(X\) and \(Y\)
  • \(Z\) confounds the \(X⟶Y\) association
  • Paths between \(X\) and \(Y\):
    • \(X⟶Y\)
    • \(X⟵Z⟶Y\)
  • \(Z\) is a backdoor path
  • Even if there was no \(X⟶Y\), \(Z\) connects them

Finding Paths: Campaign Money Example

D X Money Raised Y Total Votes X->Y Z Candidate Quality Z->X Z->Y

Paths between Money and Votes:

  1. Money ⟶ Total Votes
  2. Money ⟵ Candidate Quality ⟶ Total Votes
  • Accounting for Quality closes the backdoor
  • In other words, we:
    • compare candidates as if they had the same Quality
    • remove differences that are predicted by Quality
    • hold Quality constant

Finding Paths: A More Complex DAG

List all paths that connect Money raised with Total Votes (regardless of the direction of arrows). Which of them are backdoor paths?

D Candidate quality Candidate quality Money raised Money raised Candidate quality->Money raised Total votes Total votes Candidate quality->Total votes Hire campaign manager Hire campaign manager Hire campaign manager->Total votes Money raised->Hire campaign manager Money raised->Total votes Won election Won election Money raised->Won election Total votes->Won election District District District->Money raised District->Total votes History History History->District Party Party History->Party Party->Money raised Party->Total votes

03:00
  1. Money ⟶ Total Votes
  2. Money ⟶ Hire campaign manager ⟶ Total Votes
  3. Money ⟶ Won Election ⟵Total Votes
  4. Money ⟵ Candidate Quality ⟶ Total Votes
  5. Money ⟵ District ⟶ Total Votes
  6. Money ⟵ Party ⟶ Total Votes
  7. Money ⟵ District ⟵ History ⟶ Party ⟶ Total Votes
  8. Money ⟵ Party ⟵ History ⟶ District ⟶ Total Votes

Closing Backdoor Paths Is the Goal

D Candidate quality Candidate quality Money raised Money raised Candidate quality->Money raised Total votes Total votes Candidate quality->Total votes Hire campaign manager Hire campaign manager Hire campaign manager->Total votes Money raised->Hire campaign manager Money raised->Total votes Won election Won election Money raised->Won election Total votes->Won election District District District->Money raised District->Total votes History History History->District Party Party History->Party Party->Money raised Party->Total votes

03:00

Frontdoor Paths:

  1. Money ⟶ Total Votes
  2. Money ⟶ Hire campaign manager ⟶ Total Votes

Closed Backdoor Path:

  1. Money ⟶ Won Election ⟵Total Votes

Open Backdoor Paths:

  1. Money ⟵ Candidate Quality ⟶ Total Votes
  2. Money ⟵ District ⟶ Total Votes
  3. Money ⟵ Party ⟶ Total Votes
  4. Money ⟵ District ⟵ History ⟶ Party ⟶ Total Votes
  5. Money ⟵ Party ⟵ History ⟶ District ⟶ Total Votes
  • Adjusting for Quality, District, and Party closes open backdoors. ⟶ Yes!
  • Unobserved History then also does not confound Money and Votes.
  • Adjusting for Won Election opens a backdoor ⟶ No!

Experiments Close Backdoors

Observational Study

D Candidate quality Candidate quality Money raised Money raised Candidate quality->Money raised Total votes Total votes Candidate quality->Total votes Hire campaign manager Hire campaign manager Hire campaign manager->Total votes Money raised->Hire campaign manager Money raised->Total votes Won election Won election Money raised->Won election Total votes->Won election District District District->Money raised District->Total votes History History History->District Party Party History->Party Party->Money raised Party->Total votes

Experimental Study

D Candidate quality Candidate quality Total votes Total votes Candidate quality->Total votes Hire campaign manager Hire campaign manager Hire campaign manager->Total votes Money raised Money raised Money raised->Hire campaign manager Money raised->Total votes Won election Won election Money raised->Won election Total votes->Won election District District District->Total votes History History History->District Party Party History->Party Party->Total votes

Backdoor Criterion Comes from \(do\)-calculus

  • If we apply a set of logical rules called \(do\)-calculus, we can strip away confounding relationships and isolate effect of interest in observational data
  • \(do()\) operator represents a direct intervention in a DAG and means setting a Node to a particular value (e.g., Money Raised = $10’000), like as if we were doing an experiment
  • Estimating \(E[\text{Total Votes} | do(\text{Money raised})]\) is impossible though, we can only estimate \(E[\text{Total Votes} | \text{Money raised}]\), but \(E[\text{Total Votes} | do(\text{Money raised})] \neq E[\text{Total Votes} | \text{Money raised}]\)
  • Applying the rules of \(do-\)calculus allows us to remove \(do()\) and make the causal effect identifiable (i.e. isolated)

If you can transform \(do()\) expressions to \(do\)-free versions, you can legally make causal inferences from observational data

Main Take Aways

  • Causal diagrams sketch the DGP and allow us to see if the effect of interest is identifiable, i.e. that we can isolate the path(s) that refer to our effect
  • There will often be more than one single appropriate DAG
  • Backdoor paths create systematic, noncausal correlations between the causal variable of interest and the outcome you are trying to study, and we need to close them
  • If you assume your DAG is right and you closed all backdoors, you can legally make causal inferences from observational data