Abstract

Variance Inflation Factors (VIFs) are used to detect collinearity among predictors in regression models. Textbook explanation of collinearity and diagnostics such as VIFs have focused on numeric predictors as being "co-linear" or "co-planar", with little attention paid to VIFs when a dummy variable is included in the model. This work was motivated by two regression models with high VIFs, where "standard' interpretations of causes of collinearity made no sense. The first was an alfalfa-breeding model with two numeric predictors and two dummy variables. The second was an economic model with one numeric predictor, one dummy and the numeric x dummy cross-product. This paper gives formulas for VIFs for several regression models with a dummy variable which indicate that these VIFs are functions of the numeric predictors' means, sums of squares and sample sizes within the two dummy groups. The economic regression model is also presented to illustrate how high VIFs occurred in this data. Researchers should be cautious in using high VIFs as a reason for deleting predictors in general but especially if dummy variables are involved. It is recommended that collinearity diagnostics be applied to the numeric predictors first to check for collinearity without the influence of any dummies, then add dummy variables in one at a time to see their effect on VIFs.

Keywords

collinearity, multicollinearity, indicator variable

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS
 
Apr 29th, 2:40 PM

VARIANCE INFLATION FACTORS IN REGRESSION MODELS WITH DUMMY VARIABLES

Variance Inflation Factors (VIFs) are used to detect collinearity among predictors in regression models. Textbook explanation of collinearity and diagnostics such as VIFs have focused on numeric predictors as being "co-linear" or "co-planar", with little attention paid to VIFs when a dummy variable is included in the model. This work was motivated by two regression models with high VIFs, where "standard' interpretations of causes of collinearity made no sense. The first was an alfalfa-breeding model with two numeric predictors and two dummy variables. The second was an economic model with one numeric predictor, one dummy and the numeric x dummy cross-product. This paper gives formulas for VIFs for several regression models with a dummy variable which indicate that these VIFs are functions of the numeric predictors' means, sums of squares and sample sizes within the two dummy groups. The economic regression model is also presented to illustrate how high VIFs occurred in this data. Researchers should be cautious in using high VIFs as a reason for deleting predictors in general but especially if dummy variables are involved. It is recommended that collinearity diagnostics be applied to the numeric predictors first to check for collinearity without the influence of any dummies, then add dummy variables in one at a time to see their effect on VIFs.