IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA

2002 - 14th Annual Conference Proceedings

Title

IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA

Author Information

Guoping Shu
Beiyan Zeng
Deanne Wright
Oscar Smith

Abstract

We have assessed the impact of 13 different data transformation methods on the performance of four types of clustering methods (partitioning (K-mean), hierarchical distance (Average Linkage), multivariate normal mixture, and non-parametric kernel density) and four cluster number determination statistics (CNDS) (Pseudo F, Pseudo t², Cubic Clustering Criterion (CCC), and Bayesian Information Criterion (BIC), using both simulated and real gene expression profile data. We found that Square Root, Cubic Root, and Spacing transformations have mostly positive impacts on the performance of the four types of clustering methods whereas Tukey's Bisquare and Interquantile Range have mostly negative impacts. The impacts from other transformation methods are clustering method-specific and data type-specific. The performance of CNDS improves with appropriately transformed data. Multivariate Mixture Clustering and Kernel Density Clustering perform better than K-mean and Average Linkage in grouping both simulated and real gene expression profile data.

Keywords

cluster analysis, gene expression profile, data transformation, data normalization, cluster number determination statistics, robustness, Pseudo F, Pseudo t2, cubic clustering criterion, Bayesian information criterion, Average linkage, k-mean, multivariate mixture-model, kernel density clustering, nonparametric clustering

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

Shu, Guoping; Zeng, Beiyan; Wright, Deanne; and Smith, Oscar (2002). "IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA," Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475-7772.1203

Download

Included in

Agriculture Commons, Applied Statistics Commons

COinS

Apr 28th, 1:00 PM

IMPACT OF DATA TRANSFORMATION ON THE PERFORMANCE OF DIFFERENT CLUSTERING METHODS AND CLUSTER NUMBER DETERMINATION STATISTICS FOR ANALYZING GENE EXPRESSION PROFILE DATA

Conference on Applied Statistics in Agriculture

2002 - 14th Annual Conference Proceedings

Title

Author Information

Abstract

Keywords

Creative Commons License

Recommended Citation

Included in

Main Conference Information

Proceedings Authors

General Information

Conference on Applied Statistics in Agriculture

2002 - 14th Annual Conference Proceedings

Title

Author Information

Abstract

Keywords

Creative Commons License

Recommended Citation

Included in

Share

Main Conference Information

Proceedings Authors

General Information