Data-driven collection development: Text mining college course catalogs

Keywords

Python, R, collection development, text mining, data science, academic libraries

Abstract

Academic librarians are trained to develop and manage collections. They rely on their own subject expertise in academic disciplines, input from teaching faculty, and professional training to make informed selections to support institutional curriculum. Professional training in collection development has, in recent years, focused on evidence-based acquisition methods (Johnson, 2018, p. 134). College and university course catalogs are a potential but untapped source of evidence for identifying topics of importance to institutional curricula. Course descriptions are concise descriptions of the subjects covered in college or university courses and therefore the topics about which students may require additional sources of information. Until recently, examining course catalogs was a time-consuming prospect. The advent of data and text mining techniques, however, makes it possible to analyze course descriptions with much less time and effort expended. This article contains a brief introduction to data science in libraries; details of tools and processes used for collecting and cleaning course catalog data; and preliminary results of a project to mine course catalogs for changes in curriculum focus to benefit library collection development decisions.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.

Recommended Citation

Sutton, Sarah W.; Tolbert, Hunter; and Harris, Khiana (2024) "Data-driven collection development: Text mining college course catalogs," Kansas Library Association College and University Libraries Section Proceedings: Vol. 14: No. 1. https://doi.org/10.4148/2160-942X.1093

References

Association of College and Research Libraries. (2018). Standards for libraries in higher education. https://www.ala.org/acrl/standards/standardslibraries

Been, J., Thompson, M., & Weber. (2023, March 6). Of course we want to see into the future: Data mining course catalogs. Electronic Resources in Libraries, Austin, TX.

Been, J., Thompson, M., & Weber, M. (2024, March 4). Data nebulae: Shaping library collections for the future. Electronic Resources & Libraries, Austin, TX.

Johnson, P. (2018). Fundamentals of collection development and management (4th ed.). American Library Association.

Finnell, J., & Fontane, W. (2010). Reference Question Data Mining: A Systematic Approach to Library Outreach. Reference & User Services Quarterly, 49(3), 278–286. http://www.jstor.org/stable/20865263

Lin, S., & Scott, D. (2023). Hands on data-science for librarians. Chapman and Hall/CRC.

IBM. (2024, April 9). What is text mining? https://www.ibm.com/topics/text-mining

Kapadia, S. (2022). Topic modeling in Python: Latent Dirichlet Allocation (LDA). Retrieved 5/13/24 from https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

Mathworks. (2024). What Is an N-Gram? https://www.mathworks.com/discovery/ngram.html

Rafique, A., Ameen, K., & Arshad, A. (2023). E-book data mining: Real information behavior of university academic community. Library Hi Tech, 41(2), 413–431. https://doi.org/10.1108/LHT-07-2020-0176

Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach. O’Reilly Media. https://www.tidytextmining.com/#welcome-to-text-mining-with-r

Tu, Y.-F., Chang, S.-C., & Hwang, G.-J. (2021). Analysing reader behaviours in self-service library stations using a bibliomining approach. The Electronic Library, 39(1), 1–16. https://doi.org/10.1108/EL-01-2020-0004

Download

Included in

Collection Development and Management Commons, Data Science Commons

COinS

Kansas Library Association College and University Libraries Section Proceedings