R for Data Analysis
Session 7
University of Mannheim
Fall 2023
Fitting the line with OLS
Interpretation of Regression Coefficients
\[ y = \hat \beta_0 + \hat \beta_1 x_1 + \hat \varepsilon \]
\[ \operatorname{\widehat{Happiness}} = 1.16 + 0.155(\operatorname{Number\ of\ Cookies\ Eaten}) \]
The slope of the model for predicting happiness score from number of consumed cookies is 0.155. Which of the following is the best interpretation of this value?
01:30
\[ \operatorname{\widehat{Happiness}} = 1.16 + 0.155(\operatorname{Number\ of\ Cookies\ Eaten}) \]
Slope: For every additional cookie eaten, we expect the happiness score to be higher by 0.155 points, on average.
Intercept: If the number of eaten cookies is 0, we expect the happiness score to be 1.16 points.
\[ \begin{aligned} &\widehat{\text{Happiness}} = 2 - 0.1 \cdot \text{Cookies} \end{aligned} \]
\[ \begin{aligned} &\widehat{\text{Happiness}} = 2 + 0 \cdot \text{Cookies} = \overline{\text{Happiness}} \end{aligned} \]
\[ \begin{aligned} &\widehat{\text{Happiness}} = 1.1 + 0.1 \cdot \text{Cookies} \end{aligned} \]
\[ \begin{aligned} &\widehat{\text{Happiness}} = 1.16 + 0.155 \cdot \text{Cookies} \end{aligned} \]
Explained Variance (Sum of Squares): \[ESS = \sum^{n}_{i=1}(\hat y_i - \bar y)^2\]
Sum of Squared Residuals: \[RSS = \sum^n_{i=1}\hat{\varepsilon_i}^2 = \sum^n_{i=1}(y_i - \hat{y_i})^2\]
Total Sum of Squares: \[TSS = \sum^{n}_{i=1}(y_i - \bar y)^2 = ESS + RSS\]
\[\text{Sum of Squared Residuals (SSR)} \\= \sum^n_{i=1}\hat{\varepsilon_i}^2 \\ = \sum^n_{i=1}(y_i - \hat{y_i})^2 \\ = \sum^n_{i=1}(y_i - \hat{\beta_0} - \hat{\beta_1} x_i)^2\]
\[\bar{Y} = \hat \beta_0 + \hat \beta_1 \bar{X} ~ \rightarrow ~ \hat \beta_0 = \bar{y} - \hat \beta_1 \bar{x}\]
The slope has the same sign as the correlation coefficient: \(\beta_X = Corr(X,Y) \dfrac{{\sigma_Y}}{{\sigma_X}}\)
The sum of the residuals is zero (by design): \(\sum_{i = 1}^n e_i = 0\)