#### Abstract

A common practice in many scientific disciplines is to take measurements on several different variables on each unit from a designed experiment. This practice is cost efficient and results in data that may be analyzed using multivariate statistical methods. Usually, principal components analysis (PCA) is conducted by decomposing the covariance matrix of the several dependent variables using eigenanalysis without accounting for possible correlations among the observations. To evaluate how correlated observations bias PCA results, we used algebraic derivation and simulation for several different types of correlation structures. Our results indicated that sampling error generally had a much larger impact on the bias of PCA results than correlation between the observations. If we ignore the sampling error and there are no time trends or treatment effects, the PC's and the percent variance explained by a PC is not affected by correlated observations, however the eigenvalues are biased. If the sampling error is considered, for moderate sized correlations between observations and reasonably sized designs, bias was generally small enough to ignore for the first PC, otherwise SAS PROC MIXED may be used to easily correct for correlated observations, resulting in less bias in the PCA results.

#### Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

BIAS IN PRINCIPAL COMPONENTS ANALYSIS DUE TO CORRELATED OBSERVATIONS

A common practice in many scientific disciplines is to take measurements on several different variables on each unit from a designed experiment. This practice is cost efficient and results in data that may be analyzed using multivariate statistical methods. Usually, principal components analysis (PCA) is conducted by decomposing the covariance matrix of the several dependent variables using eigenanalysis without accounting for possible correlations among the observations. To evaluate how correlated observations bias PCA results, we used algebraic derivation and simulation for several different types of correlation structures. Our results indicated that sampling error generally had a much larger impact on the bias of PCA results than correlation between the observations. If we ignore the sampling error and there are no time trends or treatment effects, the PC's and the percent variance explained by a PC is not affected by correlated observations, however the eigenvalues are biased. If the sampling error is considered, for moderate sized correlations between observations and reasonably sized designs, bias was generally small enough to ignore for the first PC, otherwise SAS PROC MIXED may be used to easily correct for correlated observations, resulting in less bias in the PCA results.