This chapter revisits a regression analysis to explore the normal least squares assumption of approximately equal variance. It also considers some of the data transformations that can be used to achieve this. A linear regression of transformed data is compared with the generalized linear model equivalent that avoids transformation by using a link function and non-normal distributions. Generalized linear models based on maximum likelihood use a link function to model the mean (in this case a square-root link) and a variance function to model the variability (in this case the gamma distribution where the variance increases as the square of the mean). The Box–Cox family of transformations is explained in detail.

This chapter looks at three of the main types of generalized linear model (GLM). GLMs using the Poisson distribution are a good starting place when dealing with integer count data. The default log link function prevents the prediction of negative counts and the Poisson distribution models the variance (approximately equal to the mean). GLMs with a binomial distribution are designed for the analysis of binomial counts (how many times something occurred relative to the total number of possible times it could have occurred). A logistic link function constrains predictions to be above zero and below the maximum using the S-shaped logistic curve. Overdispersion can be diagnosed and dealt with using a quasi-maximum likelihood extension to GLM analysis. Binomial GLMs can also be used to analyse binary data as a special case with some minor differences to the analysis introduced by the constrained nature of the binary data.