Presenter Information

Erick Saenz-GardeaFollow

Student Major/Year in School

Computer Science, second year

Faculty Mentor Information

Dr. William Hsu, Department of Computer Science, College of Engineering

Abstract

As our world becomes enhanced by the digital age, the amount of data can be, at times, overwhelming. To combat this, we have begun to use machine learning algorithms and text mining techniques to help us understand all of this readily available information. This work uses machine learning and Natural Language Processing to identify ingredients and recipe containing sentences within manufactured material documents. Utilizing this technology, text documents in the STEM field can be transformed into meaningful values that a computer can interpret and understand. One of our goals is to assist future scholars in their research by cutting down the amount of time needed to analyze and re-analyze an entire paper. The features highlighted here are Part-Of-Speech tagging, Named Entity Recognition, measurements, and Wikification possibilities. Once these features are extracted, we then manually annotate text documents to train the Naïve Bayes Classifier. Our initial results indicate that we have favorable precision of true positives at the cost of a low recall rate. Current research is focused on eliminating the abundance of false positives to improve these values. In addition to this, we have not found a correlation between the number of wikification possibilities and recipe containing sentences. The ultimate goal for this project is to present step-by-step instructions for the recipe with no additional resources and also providing alternative known recipes with source articles.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS
 

Identifying Sentences with Recipe information with Natural Language Processing

As our world becomes enhanced by the digital age, the amount of data can be, at times, overwhelming. To combat this, we have begun to use machine learning algorithms and text mining techniques to help us understand all of this readily available information. This work uses machine learning and Natural Language Processing to identify ingredients and recipe containing sentences within manufactured material documents. Utilizing this technology, text documents in the STEM field can be transformed into meaningful values that a computer can interpret and understand. One of our goals is to assist future scholars in their research by cutting down the amount of time needed to analyze and re-analyze an entire paper. The features highlighted here are Part-Of-Speech tagging, Named Entity Recognition, measurements, and Wikification possibilities. Once these features are extracted, we then manually annotate text documents to train the Naïve Bayes Classifier. Our initial results indicate that we have favorable precision of true positives at the cost of a low recall rate. Current research is focused on eliminating the abundance of false positives to improve these values. In addition to this, we have not found a correlation between the number of wikification possibilities and recipe containing sentences. The ultimate goal for this project is to present step-by-step instructions for the recipe with no additional resources and also providing alternative known recipes with source articles.