Each normality test assumes the data is normal and checks how likely that assumption is. A high p-value means there’s no evidence this assumption is false: the data is normal. A low p-value means that it’s very likely that the assumption is false: the data is not normal. This is one of the few occasions in statistics where you actually hope for a high p-value đź™‚

It’s a personal preference. In R people use Shapiro Wilk since the D’Agostino Pearson omnibus test is not implemented in R. In Prism most people use D’Agostino Pearson since it’s recommended by Graphpad. If I can I do both.

Both tests check normality but each in a different way. Shapiro Wilk compares the distribution of the data to a standard normal distribution.

D’Agostino Pearson compares the skewness and the tails of the data to those of a standard normal distribution.

Both tests use very different approaches to check normality so it makes sense to use both of them. In most cases they will agree.

You have to look at the data (e.g. by making a histogram) and find out why. D’Agostino Pearson will be affected more by outliers than Shapiro Wilk. Outliers have a big impact on the skew but not on the slope of the central half of the data in the QQ plot.

If you have few replicates you either check normality of the residuals or:

- Assume the data are normally distributed and do a parametric test. If the data are not really normal the test can generate false positives.
- Assume the data are not normal and do a non-parametric test. If the data in reality are normal the test can generate false negatives.

If you only have 3 measurements per group, non parametric tests will be too stringent so you’ll typically use a parametric test. However, you have to realize that the outcome might be a false positive.

Residuals are calculated for each group separately and then combined. They are calculated by calculating the mean of each group, and subtracting that mean from every data value in the group.

No they are not useful when you have few replicates.

Normality tests are not reliable for large data sets. They are too stringent: they will say the data are not normal while they are.

Histograms are reliable. If you don’t see a bell curve, the data are not normal. For data sets with > 30 values you can assume normality even if the histogram looks a bit skewed. If the histogram looks very skewed, a log transformation will often bring solace. Only in extreme cases e.g. you see two bell curves instead of one, you may not assume normality. In that case, the data represents 2 populations instead of 1 and you will have to find the factor that determines these 2 populations e.g. there might be a difference between males and females or between young and old individuals…

Quizzes