A correlation quantifies the linear association between two variables. From one perspective, a correlation has two parts: one part quantifies the association, and the other part sets the scale of that association.

The first part—the covariance, also the correlation numerator—equates to a sort of “average sum of squares” of two variables:

It could be easier to interpret the covariance as an “average of the X-Y matches”: Deviations of X scores above the X mean multipled by deviations of Y scores below the Y mean will be negative, and deviations of X scores above the X mean multipled by deviations of Y scores above the Y mean will be positive. More “mismatches” leads to a negative covariance and more “matches” leads to a positive covariance.

The second part—the product of the standard deviations, also the correlation denominator—restricts the association to values from -1.00 to 1.00.

Divide the numerator by the denominator and you get a sort of “ratio of the sum of squares”, the Pearson correlation coefficient:

Square this “standardized covariance” for an estimate of the proportion of variance of Y that can be accounted for by a linear function of X, \(R^2_{XY}\).

By the way, the correlation equation is very similar to the bivariate linear regression beta coefficient equation. The only difference is in the denominator which excludes the Y variance:

An adjusted correlation refers to the (square root of the) change in a regression model’s \(R^2\) after adding a single predictor to the model: \(R^2_{full} - R^2_{reduced}\). This change quantifies that additional predictor’s “unique” contribution to observed variance explained. Put another way, this value quantifies observed variance in Y explained by a linear function of X after removing variance shared between X and the other predictors in the model.

Correct functional form.Your model variables share linear relationships.

No omitted influences.This one is hard: Your model accounts for all relevant influences on the variables included. All models are wrong, but how wrong is yours?

Accurate measurement.Your measurements are valid and reliable. Note that unreliable measures can’t be valid, and reliable measures don’t necessairly measure just one construct or even your construct.

Well-behaved residuals.Residuals (i.e., prediction errors) aren’t correlated with predictor variables or eachother, and residuals have constant variance across values of your predictor variables.

```
# library("tidyverse")
# library("knitr")
# library("effects")
# library("psych")
# library("candisc")
library(tidyverse)
library(knitr)
library(effects)
library(psych)
library(candisc)
# select from dplyr
select <- dplyr::select
recode <- dplyr::recode
```

From

`help("HSB")`

: “The High School and Beyond Project was a longitudinal study of students in the U.S. carried out in 1980 by the National Center for Education Statistics. Data were collected from 58,270 high school students (28,240 seniors and 30,030 sophomores) and 1,015 secondary schools. The HSB data frame is sample of 600 observations, of unknown characteristics, originally taken from Tatsuoka (1988).”

```
HSB <- as_tibble(HSB)
# print a random subset of rows from the dataset
HSB %>% sample_n(size = 15) %>% kable()
```

<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
<<<<<<< HEAD
=======
>>>>>>> 47303b579cc9bfad90fb5dc653eb42904ad01c95
id | gender | race | ses | sch | prog | locus | concept | mot | career | read | write | math | sci | ss | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

355 | female | 222 | male | white | low | public | general | -0.40 | -0.89 | 0.33 | operative | 41.6 | 56.7 | 45.0 | 50.4 | 43.1 | ||||||||

575 | female | white | middle | private | general | 0.68 | 0.03 | 0.00 | prof2 | 44.2 | 35.9 | 43.6 | 47.1 | 40.6 | ||||||||||

335 | 0.93 | 0.34 | 1.00 | prof2 | 46.9 | 54.1 | 54.6 | 55.3 | 50.6 | |||||||||||||||

42 | male | hispanic | middle | public | academic | 0.03 | 0.28 | 0.33 | operative | 52.1 | 54.1 | 52.0 | 55.3 | 50.6 | ||||||||||

413 | female | white | middle | public | general | 0.46 | 0.34 | 0.67 | prof1 | 62.7 | 61.9 | 52.9 | 44.4 | 40.6 | ||||||||||

221 | male | white | low | public | general | -0.36 | -1.67 | 1.00 | farmer | 44.2 | 43.7 | 56.4 | 58.0 | 60.5 | ||||||||||

254 | female | white | high | public | academic | 0.48 | -0.47 | 0.33 | prof1 | 52.1 | 61.9 | 55.5 | 60.7 | 60.5 | ||||||||||

463 | male | white | middle | public | general | -0.11 | 0.25 | 1.00 | prof1 | 44.2 | 44.3 | 45.6 | 39.0 | 50.6 | ||||||||||

294 | male | white | low | public | academic | -0.82 | -0.76 | 0.00 | clerical | 57.4 | 43.7 | 59.6 | 52.6 | 50.6 | ||||||||||

419 | male | white | middle | public | vocation | -0.19 | 0.03 | 0.33 | craftsman | 54.8 | 51.5 | 42.8 | 60.7 | 50.6 | ||||||||||

349 | 0.51 | 0.03 | 0.33 | manager | 60.1 | 51.5 | 53.9 | 63.4 | 50.6 | |||||||||||||||

486 | female | white | middle | public | academic | 0.53 | 0.81 | 0.67 | prof1 | 54.8 | 59.3 | 61.4 | 47.1 | 55.6 | ||||||||||

458 | male | white | middle | public | academic | 0.46 | 0.65 | 1.00 | technical | 49.5 | 48.9 | 60.5 | 55.3 | 55.6 | ||||||||||

137 | male | african-amer | middle | public | academic | -0.37 | -1.90 | 0.67 | manager | 54.8 | 36.5 | 37.7 | 49.8 | 60.5 | ||||||||||

200 | male | white | high | public | academic | -0.27 | 0.88 | 1.00 | sales | 52.1 | 64.5 | 60.6 | 60.7 | 45.6 | ||||||||||

440 | female | white | high | public | academic | 1.36 | 0.94 | 1.00 | homemaker | 52.1 | 48.9 | 51.3 | 41.7 | 45.6 | ||||||||||

367 | male | white | high | public | vocation | -1.50 | 0.03 | 0.67 | prof1 | 33.6 | 48.9 | 38.6 | 42.3 | 55.6 | ||||||||||

354 | general | 0.70 | -0.16 | 0.33 | prof1 | 68.0 | 59.3 | 55.7 | 63.4 | 65.5 | ||||||||||||||

277 | female | white | high | public | academic | -0.60 | -1.18 | 0.67 | clerical | 54.8 | 59.3 | 68.0 | 49.3 | 65.5 | ||||||||||

11 | female | hispanic | low | public | academic | 0.25 | 0.34 | 1.00 | prof1 | 49.5 | 61.9 | 42.9 | 41.7 | 50.6 | ||||||||||

390 | female | white | high | public | academic | 0.45 | 0.03 | 0.67 | prof1 | 60.1 | 61.9 | 51.9 | 53.1 | 58.1 | ||||||||||

307 | male | 1.11 | 0.34 | 1.00 | prof2 | 73.3 | 67.1 | 62.3 | 58.0 | 65.5 | ||||||||||||||

173 | female | white | low | public | general | -0.61 | 0.03 | 0.33 | proprietor | 44.2 | 54.1 | 40.3 | 52.6 | 40.6 | ||||||||||

524 | female | white | middle | private | academic | -0.66 | -1.07 | 0.67 | clerical | 49.5 | 61.9 | 60.4 | 47.1 | 50.6 | ||||||||||

264 | female | white | low | public | academic | 0.46 | 0.34 | 1.00 | prof1 | 76.0 | 52.1 | 64.1 | 63.9 | 60.5 | ||||||||||

523 | female | white | high | private | general | 0.68 | 0.32 | 1.00 | service | 36.3 | 56.7 | 41.9 | 49.8 | 40.6 | ||||||||||

522 | male | white | middle | private | academic | 0.00 | 0.65 | 1.00 | military | 52.1 | 61.9 | 62.1 | 58.0 | 60.5 | general | -0.80 | 0.15 | 0.33 | service | 41.6 | 41.1 | 39.5 | 47.1 | 60.5 |

404 | female | white | middle | public | general | -0.38 | -0.47 | 0.67 | homemaker | 62.7 | 43.7 | 44.7 | 52.6 | 41.9 | ||||||||||

493 | male | white | low | public | vocation | -0.86 | 0.28 | 1.00 | farmer | 36.3 | 48.9 | 54.4 | 60.7 | 35.6 |

`alpha`

below refers to the points’ transparency (0.5 = 50%),`lm`

refers to linear model and`se`

refers to standard error bands

```
HSB %>%
ggplot(mapping = aes(x = math, y = sci)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = "red")
```