\(~\)

I recently discovered a wonderful package for easily checking linear regression assumptions via diagnostic plots: the check_model() function of the performance package.

Let’s make a quick demonstration of this package first.

# Load necessary libraries
library(performance)
library(see)
# Note: if you haven't installed the packages above, you'll need to install them first by using:
# install.packages("performance") and install.packages("see")

# Create a regression model (using data available in R by default)
model <- lm(mpg ~ wt * cyl + gear, data = mtcars)

# Check model assumptions
check_model(model)

Wonderful! And efficient. However, sometimes, for different reasons, someone might want to check assumptions with an objective test. Testing each assumption one by one is kind of time-consuming, so I made a convenience function to accelerate this process.

Just a warning before jumping into objective assumption tests. Note these tests are generally NOT recommended! With a large sample size, objective assumption tests will be OVER-sensitive to deviations from expected values, while with a small sample size, the objective assumption tests will be UNDER-powered to detect real, existing deviations. It can also mask other visual patterns not reflected in a single number. It is thus recommended to use visual assessment of diagnostic plots instead (e.g., see Kozak & Piepho, 2018; Schucany & Ng, 2006). For a quick and accessible argument against tests of normality specifically, see this source.

Getting Started

Load the function from my github:

source("https://raw.githubusercontent.com/RemPsyc/niceplots/master/niceAssFunction.R")

The raw output doesn’t look so nice in the console because the column names are so long, but it does look good in the viewer or when exported to a word processing software because the column names get wrapped. You can try it in Rstudio yourself with the following command:

*Warning:* running the function below for the first time will install and load the following
packages (if they are not already installed and loaded on your machine): lmtest and crayon.
Note: This will run many lines of code on your console and could take a few minutes.
View(niceAss(model))
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
## 

This function is particularly useful when testing several models simultaneously as it makes for a nice table of results. Let’s make a short demonstration of this.

# Define our dependent variables
(DV <- names(mtcars[-1]))
##  [1] "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
# Make list of all formulas
(formulas <- paste(DV, "~ mpg"))
##  [1] "cyl ~ mpg"  "disp ~ mpg" "hp ~ mpg"   "drat ~ mpg" "wt ~ mpg"  
##  [6] "qsec ~ mpg" "vs ~ mpg"   "am ~ mpg"   "gear ~ mpg" "carb ~ mpg"
# Make list of all models
models.list <- sapply(X = formulas, FUN = lm, data = mtcars, simplify = FALSE, USE.NAMES = TRUE)

# Make diagnostic table
ass.table <- do.call("rbind", lapply(models.list, niceAss))
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
##   
## Interpretation: (p) values < .05 imply assumptions are not respected. 
## Diagnostic is how many assumptions are not respected for a given model or variable. 
## 

Use the Viewer for better results (or export it to Word using my niceTable() function)

View(ass.table)

Categorical Predictors

If you have categorical predictors (e.g., groups), then it is suggested that you check assumptions directly on the raw data rather than on the residuals (source).

In this case, one can make qqplots on the different groups to first check the normality assumption.

QQ Plots

Load the function from my github:

source("https://raw.githubusercontent.com/RemPsyc/niceplots/master/niceQQFunction.R")

Make the basic plot

niceQQ(variable = "Sepal.Length",
       group = "Species",
       data = iris)

Generally speaking, as long as the dots lie within the confidence band, all is good and the distribute normally.

Further customization

Change (or reorder) colours, x-axis title or y-axis title, names of groups, no grid lines, or add the p-value from the Shapiro-Wilk (if you wish so).

niceQQ(variable = "Sepal.Length",
       group = "Species",
       data = iris,
       colours = c("#00BA38", "#619CFF", "#F8766D"),
       groups.labels = c("(a) Setosa", "(b) Versicolor", "(c) Virginica"),
       grid = FALSE,
       shapiro = TRUE,
       title = NULL)

A p > .05 suggests that your data is normally distributed (according to the Shapiro-Wilk test). Inversely, a p < .05 suggests your data is not normally distributed. Keep in mind however that there are several known problems with objective tests, so visual assessment is recommended. Nonetheless, for beginners, it can sometimes be useful (or rather, reassuring) to be able to pair the visual assessment to the tests values.

Density Plots

Some people think that density distributions are less useful than qqplots, but sometimes, people still like to look at distributions, so I made another function for this.

Load the function from my github:

source("https://raw.githubusercontent.com/RemPsyc/niceplots/master/niceDensityFunction.R")

Make the basic plot

niceDensity(variable = "Sepal.Length",
            group = "Species",
            data = iris)

Pro tip: Save density plots with a narrower width for better-looking results.

Further customization

Change (or reorder) colours, x-axis title or y-axis title, names of groups, no grid lines, or add the p-value from the Shapiro-Wilk (if you wish so).

niceDensity(variable = "Sepal.Length",
            group = "Species",
            data = iris,
            colours = c("#00BA38", "#619CFF", "#F8766D"),
            xtitle = "Sepal Length",
            ytitle = "Density (vs. Normal Distribution)",
            groups.labels = c("(a) Setosa", "(b) Versicolor", "(c) Virginica"),
            grid = FALSE,
            shapiro = TRUE,
            title = "Density (Sepal Length)")

Homoscedasticity (equality of variance)

To check homoscedasticity (equality of variance) for categorical predictors, we can check the variance within each group. We can use a rule of thumb that the variance of one group should not be four times that of another group. We can check that with this function which makes a table.

Load the function from my github:

source("https://raw.githubusercontent.com/RemPsyc/niceplots/master/niceVarFunction.R")

Make the basic table

View(niceVar(variable="Sepal.Length",
             group="Species",
             data=iris))

Let’s now try it for many variables to see how handy it can be.

# Define our dependent variables
DV <- names(iris[1:4])

# Make diagnostic table
var.table <- do.call("rbind", lapply(DV, niceVar, data=iris, group="Species"))

Use the Viewer for better results (or export it to Word using my niceTable() function)

View(var.table)

Visualizing Variance

It can also be interesting to look at variance visually for learning purposes.

Load the function from my github:

source("https://raw.githubusercontent.com/RemPsyc/niceplots/master/niceVarianceFunction.R")

Make the basic plot

niceVariance(variable = "Sepal.Length",
             group = "Species",
             data = iris)

Further customization

Change (or reorder) colours, y-axis title, or names of groups.

niceVariance(variable = "Sepal.Length", 
             group = "Species", 
             data = iris,
             colours = c("#00BA38", "#619CFF", "#F8766D"),
             ytitle = "Sepal Length",
             groups.labels = c("(a) Setosa", "(b) Versicolor", "(c) Virginica"))

\(~\)

\(~\)

Concluding Statement

Make sure to check out this page again if you use the code after a time or if you encounter errors, as I periodically update or improve the code.

You can always edit the function to suit your purposes, or contact me for questions or requests to modify this function at https://remi-theriault.com/contact! Thanks for reading my guide! :) \(~\)

\(~\)

\(~\)

\(~\)

\(~\)


Updated 2021-04-25 (added niceQQ(), niceDensity(), niceVar & niceVariance functions/sections)

\(~\)

\(~\)

\(~\)

\(~\)

\(~\)

\(~\)