Abstract
Data mining is a collection of analytical techniques to uncover new trends and patterns in large databases. These data mining techniques stress visualization to thoroughly study the structure of data and to check the validity of statistical model fit to the data and lead to knowledge discovery. Data mining is an interdisciplinary research area spanning several disciplines such as database management, machine learning, statistical computing, and expert systems. Although data mining is a relatively new term, the technology is not. Data mining allows users to analyze data from many different dimensions or angles, explore and categorize it, and summarize the relationships identified. Large investments in technology and data collection are currently being made in the area of precision agriculture, remote sensing, and in bioinformatics. Experiments conducted in these disciplines are generating mountains of data at a rapid rate. Analyzing such massive data combined with the biological and environmental information would not be possible without automated and efficient data mining techniques. Effective statistical and graphical data mining tools can enable agricultural researchers to perform quicker and more cost-effective experiments. Commonly used statistical and graphical data mining techniques in data exploration and visualization, model selection, model development, checking for violations of statistical assumptions, and model validation are presented here.
Keywords
Data exploration, supervised learning, unsupervised learning, model validation
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
Fernandez, George
(2004).
"APPLICATIONS OF STATISTICAL DATA MINING METHODS,"
Conference on Applied Statistics in Agriculture.
https://doi.org/10.4148/2475-7772.1148
APPLICATIONS OF STATISTICAL DATA MINING METHODS
Data mining is a collection of analytical techniques to uncover new trends and patterns in large databases. These data mining techniques stress visualization to thoroughly study the structure of data and to check the validity of statistical model fit to the data and lead to knowledge discovery. Data mining is an interdisciplinary research area spanning several disciplines such as database management, machine learning, statistical computing, and expert systems. Although data mining is a relatively new term, the technology is not. Data mining allows users to analyze data from many different dimensions or angles, explore and categorize it, and summarize the relationships identified. Large investments in technology and data collection are currently being made in the area of precision agriculture, remote sensing, and in bioinformatics. Experiments conducted in these disciplines are generating mountains of data at a rapid rate. Analyzing such massive data combined with the biological and environmental information would not be possible without automated and efficient data mining techniques. Effective statistical and graphical data mining tools can enable agricultural researchers to perform quicker and more cost-effective experiments. Commonly used statistical and graphical data mining techniques in data exploration and visualization, model selection, model development, checking for violations of statistical assumptions, and model validation are presented here.