Student Major/Year in School
Computer Science, 2nd year
Faculty Mentor Information
Dr. William Hsu, Computer Science, College of Engineering
Abstract
The overall goal of this research is to effectively extract steps for performing a specified procedure from published text descriptions, producing a recipe listing the materials, operations, and conditions required to perform the procedure. For example, if the procedure is to create a nanomaterial, and relevant source text consists of peer-reviewed scientific publications, a recipe should include raw materials and unit operations, among other specifications of a chemical engineering process. This project focuses on developing performance measures to evaluate recipe steps, by gauging their correctness, completeness, and non-redundancy. This is done by comparing manually annotated documents that conveyed desired results to automatically extracted steps, and finding any discrepancies to improve on how recipes are organized. Each annotator manually compiles a set of reference recipes to compare against automatically extracted ones, tallies errors based on a standard developed in collaboration with subject matter experts, then audits a different set of scientific papers marked up by another annotator. This auditing process allows a group of annotators to mutually check each other's work to ensure that recipes are correctly compiled. A corpus of experimental documents was collected using a web crawler from open access web archives. These documents were filtered to determine which ones are scientific papers, ranking them by relevance, and finally, dividing and extracting structured information about the specified ingredients and steps. My main task in this research is to measure the impact of improved extraction rules on the rate of steps correctly captured.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Carmona-Andrade, Richard (2019). "Extraction of Recipe Steps from Scientific Papers: The Nanomaterials Synthesis Domain," Kansas State University Undergraduate Research Conference. https://newprairiepress.org/ksuugradresearch/2019/posters/26
Extraction of Recipe Steps from Scientific Papers: The Nanomaterials Synthesis Domain
The overall goal of this research is to effectively extract steps for performing a specified procedure from published text descriptions, producing a recipe listing the materials, operations, and conditions required to perform the procedure. For example, if the procedure is to create a nanomaterial, and relevant source text consists of peer-reviewed scientific publications, a recipe should include raw materials and unit operations, among other specifications of a chemical engineering process. This project focuses on developing performance measures to evaluate recipe steps, by gauging their correctness, completeness, and non-redundancy. This is done by comparing manually annotated documents that conveyed desired results to automatically extracted steps, and finding any discrepancies to improve on how recipes are organized. Each annotator manually compiles a set of reference recipes to compare against automatically extracted ones, tallies errors based on a standard developed in collaboration with subject matter experts, then audits a different set of scientific papers marked up by another annotator. This auditing process allows a group of annotators to mutually check each other's work to ensure that recipes are correctly compiled. A corpus of experimental documents was collected using a web crawler from open access web archives. These documents were filtered to determine which ones are scientific papers, ranking them by relevance, and finally, dividing and extracting structured information about the specified ingredients and steps. My main task in this research is to measure the impact of improved extraction rules on the rate of steps correctly captured.