#### Abstract

A simulation study was conducted to determine how well SAS® PROC GLIMMIX (SAS Institute, Cary, NC), statistical software to fit generalized linear mixed models (GLMMs), performed for a simple GLMM, using its default settings, as a naïve user would do. Data were generated from a wide variety of distributions with the same sets of linear predictors, and under several conditions. Then, the data sets were analyzed by using the correct model (the generating model and estimating model were the same) and, subsequently, by misspecifying the estimating model, all using default settings. The data generation model was a randomized complete block design where the model parameters and sample sizes were adjusted to yield 80% power for the F-test on treatment means given a 30 block experiment with block-by-treatment interaction and with additional treatment replications within each block. Convergence rates were low for the exponential and Poisson distributions, even when the generating and estimating models matched. The normal and lognormal distributions converged 100% of the time; convergence rates for other distributions varied. As expected, reducing the number of blocks from 30 to five and increasing replications within blocks to keep total N the same reduced power to 40% or less. Except for the exponential distribution, estimates of treatment means and variance parameters were accurate with only slight biases. Misspecifying the estimating model by omitting the block-by-treatment random effect made F-tests too liberal. Since omitting that term from the model, effectively ignoring a process involved in giving rise to the data, produces symptoms of over-dispersion, several potential remedies were investigated. For all distributions, the historically recommended variance stabilizing transformation was applied, and then the transformed data were fit using a linear mixed model. For one-parameter members of the exponential family an over-dispersion parameter was included in the estimating model. The negative binomial distribution was also examined as the estimating model distribution. None of these remedial steps corrected the over-dispersion problem created by misspecifying the linear predictor, although using a variance stabilizing transformation did improve convergence rates on most distributions investigated.

#### Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

GENERALIZED LINEAR MIXED MODEL ESTIMATION USING PROC GLIMMIX: RESULTS FROM SIMULATIONS WHEN THE DATA AND MODEL MATCH, AND WHEN THE MODEL IS MISSPECIFIED

A simulation study was conducted to determine how well SAS® PROC GLIMMIX (SAS Institute, Cary, NC), statistical software to fit generalized linear mixed models (GLMMs), performed for a simple GLMM, using its default settings, as a naïve user would do. Data were generated from a wide variety of distributions with the same sets of linear predictors, and under several conditions. Then, the data sets were analyzed by using the correct model (the generating model and estimating model were the same) and, subsequently, by misspecifying the estimating model, all using default settings. The data generation model was a randomized complete block design where the model parameters and sample sizes were adjusted to yield 80% power for the F-test on treatment means given a 30 block experiment with block-by-treatment interaction and with additional treatment replications within each block. Convergence rates were low for the exponential and Poisson distributions, even when the generating and estimating models matched. The normal and lognormal distributions converged 100% of the time; convergence rates for other distributions varied. As expected, reducing the number of blocks from 30 to five and increasing replications within blocks to keep total N the same reduced power to 40% or less. Except for the exponential distribution, estimates of treatment means and variance parameters were accurate with only slight biases. Misspecifying the estimating model by omitting the block-by-treatment random effect made F-tests too liberal. Since omitting that term from the model, effectively ignoring a process involved in giving rise to the data, produces symptoms of over-dispersion, several potential remedies were investigated. For all distributions, the historically recommended variance stabilizing transformation was applied, and then the transformed data were fit using a linear mixed model. For one-parameter members of the exponential family an over-dispersion parameter was included in the estimating model. The negative binomial distribution was also examined as the estimating model distribution. None of these remedial steps corrected the over-dispersion problem created by misspecifying the linear predictor, although using a variance stabilizing transformation did improve convergence rates on most distributions investigated.