Linear Regression

R for Data Analysis
Session 7

Viktoriia Semenova

University of Mannheim
Fall 2023

Intro

Agenda for Today


Fitting the line with OLS


Interpretation of Regression Coefficients

Cookies and Happiness: OLS Regression Line

Regression Line Anatomy

Fitting and Interpreting Models

Language of Models

  • True Model \(y = \underbrace{\beta_0}_{\text{intercept}} + \underbrace{\beta_1}_{\text{slope}} x + \underbrace{\varepsilon}_{\text{error}}\)
    • Population parameters \(\beta\): truth (estimand), unknown to us
  • Estimated Model \(y = \underbrace{\hat\beta_0}_{\text{intercept}} + \underbrace{\hat\beta_1}_{\text{slope}} x + \underbrace{\hat\varepsilon}_{\text{residual}}\)
    • Estimates \(\hat{\beta}\): our best guess about the estimand given the data
  • A model has two parts:
    • Systematic Component of a linear model: \(\underbrace{\hat y}_{\text{fitted}\\\text{value}} = \underbrace{\hat\beta_0}_{\text{intercept}} + \underbrace{\hat\beta_1}_{\text{slope}} x\)
    • Stochastic Component of a linear model: \(\hat\varepsilon\)

Vocabulary

\[ y = \hat \beta_0 + \hat \beta_1 x_1 + \hat \varepsilon \]

  • \(y\): dependent variable, outcome
  • \(x\): independent variable, treatment, explanatory variable, treatment, predictor, feature
  • \(\hat y\): predicted values of y, y-hat, fitted values, regression line
  • \(\hat \beta_0\): intercept, prediction when all \(x=0\), constant
  • \(\hat \beta_k\): slope, the effect of \(k\)-th variable

Interpreting Slope Coefficient Poll

\[ \operatorname{\widehat{Happiness}} = 1.16 + 0.155(\operatorname{Number\ of\ Cookies\ Eaten}) \]

The slope of the model for predicting happiness score from number of consumed cookies is 0.155. Which of the following is the best interpretation of this value?

  1. For every additional cookie eaten, the happiness score goes up by 0.155 points, on average.
  2. For every additional cookie eaten, we expect the happiness score to be higher by 0.155 points, on average.
  3. For every additional cookie eaten, the happiness score goes up by 0.155 points.
  4. For every one point increase in happiness score, the number of cookies eaten goes up by 0.155 points, on average.
01:30

Interpreting Slope and Intercept

\[ \operatorname{\widehat{Happiness}} = 1.16 + 0.155(\operatorname{Number\ of\ Cookies\ Eaten}) \]

Slope: For every additional cookie eaten, we expect the happiness score to be higher by 0.155 points, on average.

  • Each additional cookie has the same effect on happiness, i.e. marginal effect is constant
    • Associated increase in happiness is 0.155 for the first and, say, tenth cookie

Intercept: If the number of eaten cookies is 0, we expect the happiness score to be 1.16 points.

  • Intercept is meaningful in the context of data because the predictor can feasibly take values equal to or near zero

Mechanics of Linear Regression

Explained vs. Unexplained Variation in Y

Fitting Line Example I

\[ \begin{aligned} &\widehat{\text{Happiness}} = 2 - 0.1 \cdot \text{Cookies} \end{aligned} \]

Fitting Line Example II

\[ \begin{aligned} &\widehat{\text{Happiness}} = 2 + 0 \cdot \text{Cookies} = \overline{\text{Happiness}} \end{aligned} \]

Fitting Line Example III

\[ \begin{aligned} &\widehat{\text{Happiness}} = 1.1 + 0.1 \cdot \text{Cookies} \end{aligned} \]

Fitting Line Example IV: OLS Solution

\[ \begin{aligned} &\widehat{\text{Happiness}} = 1.16 + 0.155 \cdot \text{Cookies} \end{aligned} \]

Explained vs. Unexplained Variation in Y

Explained Variance (Sum of Squares): \[ESS = \sum^{n}_{i=1}(\hat y_i - \bar y)^2\]

Sum of Squared Residuals: \[RSS = \sum^n_{i=1}\hat{\varepsilon_i}^2 = \sum^n_{i=1}(y_i - \hat{y_i})^2\]

Total Sum of Squares: \[TSS = \sum^{n}_{i=1}(y_i - \bar y)^2 = ESS + RSS\]

What Are Coefficient Values in OLS

\[\text{Sum of Squared Residuals (SSR)} \\= \sum^n_{i=1}\hat{\varepsilon_i}^2 \\ = \sum^n_{i=1}(y_i - \hat{y_i})^2 \\ = \sum^n_{i=1}(y_i - \hat{\beta_0} - \hat{\beta_1} x_i)^2\]

  • OLS estimator finds values of \(\hat\beta\) which minimize \(SSR\), the unexplained variance
  • We use differential calculus to find these values of \(\hat{\beta}\) (full derivation)

Properties of Least Squares Regression

  • The regression line goes through the center of mass point, the coordinates corresponding to average \(X\) and average \(Y\), \((\bar{X}, \bar{Y})\):

\[\bar{Y} = \hat \beta_0 + \hat \beta_1 \bar{X} ~ \rightarrow ~ \hat \beta_0 = \bar{y} - \hat \beta_1 \bar{x}\]

  • The slope has the same sign as the correlation coefficient: \(\beta_X = Corr(X,Y) \dfrac{{\sigma_Y}}{{\sigma_X}}\)

  • The sum of the residuals is zero (by design): \(\sum_{i = 1}^n e_i = 0\)