Student Major/Year in School
Computer Science, second year
Faculty Mentor Information
Dr. William Hsu, Department of Computer Science, College of Engineering
Abstract
As our world becomes enhanced by the digital age, the amount of data can be, at times, overwhelming. To combat this, we have begun to use machine learning algorithms and text mining techniques to help us understand all of this readily available information. This work uses machine learning and Natural Language Processing to identify ingredients and recipe containing sentences within manufactured material documents. Utilizing this technology, text documents in the STEM field can be transformed into meaningful values that a computer can interpret and understand. One of our goals is to assist future scholars in their research by cutting down the amount of time needed to analyze and re-analyze an entire paper. The features highlighted here are Part-Of-Speech tagging, Named Entity Recognition, measurements, and Wikification possibilities. Once these features are extracted, we then manually annotate text documents to train the Naïve Bayes Classifier. Our initial results indicate that we have favorable precision of true positives at the cost of a low recall rate. Current research is focused on eliminating the abundance of false positives to improve these values. In addition to this, we have not found a correlation between the number of wikification possibilities and recipe containing sentences. The ultimate goal for this project is to present step-by-step instructions for the recipe with no additional resources and also providing alternative known recipes with source articles.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Saenz-Gardea, Erick (2019). "Identifying Sentences with Recipe information with Natural Language Processing," Kansas State University Undergraduate Research Conference. https://newprairiepress.org/ksuugradresearch/2019/posters/47
Identifying Sentences with Recipe information with Natural Language Processing
As our world becomes enhanced by the digital age, the amount of data can be, at times, overwhelming. To combat this, we have begun to use machine learning algorithms and text mining techniques to help us understand all of this readily available information. This work uses machine learning and Natural Language Processing to identify ingredients and recipe containing sentences within manufactured material documents. Utilizing this technology, text documents in the STEM field can be transformed into meaningful values that a computer can interpret and understand. One of our goals is to assist future scholars in their research by cutting down the amount of time needed to analyze and re-analyze an entire paper. The features highlighted here are Part-Of-Speech tagging, Named Entity Recognition, measurements, and Wikification possibilities. Once these features are extracted, we then manually annotate text documents to train the Naïve Bayes Classifier. Our initial results indicate that we have favorable precision of true positives at the cost of a low recall rate. Current research is focused on eliminating the abundance of false positives to improve these values. In addition to this, we have not found a correlation between the number of wikification possibilities and recipe containing sentences. The ultimate goal for this project is to present step-by-step instructions for the recipe with no additional resources and also providing alternative known recipes with source articles.