Abstract
Variance Inflation Factors (VIFs) are used to detect collinearity among predictors in regression models. Textbook explanation of collinearity and diagnostics such as VIFs have focused on numeric predictors as being "co-linear" or "co-planar", with little attention paid to VIFs when a dummy variable is included in the model. This work was motivated by two regression models with high VIFs, where "standard' interpretations of causes of collinearity made no sense. The first was an alfalfa-breeding model with two numeric predictors and two dummy variables. The second was an economic model with one numeric predictor, one dummy and the numeric x dummy cross-product. This paper gives formulas for VIFs for several regression models with a dummy variable which indicate that these VIFs are functions of the numeric predictors' means, sums of squares and sample sizes within the two dummy groups. The economic regression model is also presented to illustrate how high VIFs occurred in this data. Researchers should be cautious in using high VIFs as a reason for deleting predictors in general but especially if dummy variables are involved. It is recommended that collinearity diagnostics be applied to the numeric predictors first to check for collinearity without the influence of any dummies, then add dummy variables in one at a time to see their effect on VIFs.
Keywords
collinearity, multicollinearity, indicator variable
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
Murray, Leigh; Nguyen, Hien; Lee, Yu-Feng; Remmenga, Marta D.; and Smith, David W.
(2012).
"VARIANCE INFLATION FACTORS IN REGRESSION MODELS WITH DUMMY VARIABLES,"
Conference on Applied Statistics in Agriculture.
https://doi.org/10.4148/2475-7772.1034
VARIANCE INFLATION FACTORS IN REGRESSION MODELS WITH DUMMY VARIABLES
Variance Inflation Factors (VIFs) are used to detect collinearity among predictors in regression models. Textbook explanation of collinearity and diagnostics such as VIFs have focused on numeric predictors as being "co-linear" or "co-planar", with little attention paid to VIFs when a dummy variable is included in the model. This work was motivated by two regression models with high VIFs, where "standard' interpretations of causes of collinearity made no sense. The first was an alfalfa-breeding model with two numeric predictors and two dummy variables. The second was an economic model with one numeric predictor, one dummy and the numeric x dummy cross-product. This paper gives formulas for VIFs for several regression models with a dummy variable which indicate that these VIFs are functions of the numeric predictors' means, sums of squares and sample sizes within the two dummy groups. The economic regression model is also presented to illustrate how high VIFs occurred in this data. Researchers should be cautious in using high VIFs as a reason for deleting predictors in general but especially if dummy variables are involved. It is recommended that collinearity diagnostics be applied to the numeric predictors first to check for collinearity without the influence of any dummies, then add dummy variables in one at a time to see their effect on VIFs.