Presenter Information

Richard Carmona-AndradeFollow

Student Major/Year in School

Computer Science, 2nd year

Faculty Mentor Information

Dr. William Hsu, Computer Science, College of Engineering

Abstract

The overall goal of this research is to effectively extract steps for performing a specified procedure from published text descriptions, producing a recipe listing the materials, operations, and conditions required to perform the procedure. For example, if the procedure is to create a nanomaterial, and relevant source text consists of peer-reviewed scientific publications, a recipe should include raw materials and unit operations, among other specifications of a chemical engineering process. This project focuses on developing performance measures to evaluate recipe steps, by gauging their correctness, completeness, and non-redundancy. This is done by comparing manually annotated documents that conveyed desired results to automatically extracted steps, and finding any discrepancies to improve on how recipes are organized. Each annotator manually compiles a set of reference recipes to compare against automatically extracted ones, tallies errors based on a standard developed in collaboration with subject matter experts, then audits a different set of scientific papers marked up by another annotator. This auditing process allows a group of annotators to mutually check each other's work to ensure that recipes are correctly compiled. A corpus of experimental documents was collected using a web crawler from open access web archives. These documents were filtered to determine which ones are scientific papers, ranking them by relevance, and finally, dividing and extracting structured information about the specified ingredients and steps. My main task in this research is to measure the impact of improved extraction rules on the rate of steps correctly captured.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS
 

Extraction of Recipe Steps from Scientific Papers: The Nanomaterials Synthesis Domain

The overall goal of this research is to effectively extract steps for performing a specified procedure from published text descriptions, producing a recipe listing the materials, operations, and conditions required to perform the procedure. For example, if the procedure is to create a nanomaterial, and relevant source text consists of peer-reviewed scientific publications, a recipe should include raw materials and unit operations, among other specifications of a chemical engineering process. This project focuses on developing performance measures to evaluate recipe steps, by gauging their correctness, completeness, and non-redundancy. This is done by comparing manually annotated documents that conveyed desired results to automatically extracted steps, and finding any discrepancies to improve on how recipes are organized. Each annotator manually compiles a set of reference recipes to compare against automatically extracted ones, tallies errors based on a standard developed in collaboration with subject matter experts, then audits a different set of scientific papers marked up by another annotator. This auditing process allows a group of annotators to mutually check each other's work to ensure that recipes are correctly compiled. A corpus of experimental documents was collected using a web crawler from open access web archives. These documents were filtered to determine which ones are scientific papers, ranking them by relevance, and finally, dividing and extracting structured information about the specified ingredients and steps. My main task in this research is to measure the impact of improved extraction rules on the rate of steps correctly captured.