|
Information Extraction
|
| Preamble |
Information Extraction (IE) is concerned with
selecting salient facts about a given topic from documents.
Typically, these facts are then entered automatically into a database, which may
then be used for further processing. IE is a technology based on analyzing natural language.
Unlike Information Retrieval (IR), which finds relevant documents from a collection
of documents and presents them to the user, IE analyses text and presents only the
specific information the user is interested in.
|
| Definitions |
"The identification of instances of a particular class of events or relationships
in a natural language text, and the extraction of the relevant arguments of the event or
relationship." It "involves the creation of a structured representation (such
as a data base) of selected information drawn from the text." [Grishman, 1997]
"A subfield of natural language processing that is concerned with identifying
predefined types of information from text." [Riloff, 1999]
"An emerging NLP technology whose function is to process unstructured, natural
language text, to locate specific pieces of information, or facts in the text, and to use
these facts to fill a database." [Yangarber, 2001]
"The task of filling template information from previously unseen text which
belongs to a pre-defined domain."
[Peshkin & Pfeffer, 2003]
"A technology based on analyzing natural language in order to extract snippets of
information." [Cunningham, 2005]
|
| IE and Healthcare |
The growing use of modern information technologies is increasing the amount of documents, information, and data accessible to health professionals. To overcome this information overload intelligent techniques are demanded supporting the information seeking tasks of health professionals. Information Extraction is a technique that aims at identifying relevant information, structuring this information, and providing means to add semantics.
IE systems have been designed to summarize medical patient records by extracting symptoms, diagnoses, physical findings, test results, and therapeutic treatments. These systems can be used to assist health care providers with quality assurance studies, or to support insurance processing, in which case each patient encounter must be categorized for reimbursement purposes. Other IE systems are used to summarize (multiple) medical scientific articles.
Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities of a dynamically changing biomedical terminology, term identification has been recognized as the current bottleneck in text mining, and-as a consequence-has become an important research topic both in natural language processing and biomedical communities.
Generating computer-interpretable clinical practice guidelines is a challenging, but burdensome and time-consuming task. IE aims at facilitating this task by automating parts of the modeling process. Relevant information is identified and extracted and can then be further processed to its final representation.
|
| Issues |
Research issues include:
- Portability
- Natural language understanding
- Evaluation
- Providing both universal and domain knowledge.
| | Current research |
A number of research groups are developing tools and methods for summarizing medical documents. Other groups use Information Extraction techniques for building medical and biomedical ontologies. Current research into the use of Information Extraction in guideline-based care focuses on support for the knowledge acquisition process (the authoring of computer-interpretable guidelines and protocols).
For links to individual projects, see below.

AMBIT
|
Acquiring Medical and Biomedical Information from Text.
[Gaizauskas et al., 2003b; Harkema et al., 2005b].
|
CLEF
|
Clinical E-Science Framework. This project aims to extract information such as symptoms,
diagnosis and treatment from clinical records of cancer patients.
[Harkema et al., 2005a; Harkema et al., 2005b]
|
MyGRID
|
MyGRID aims to produce a virtual laboratory workbench for biological researchers, one component of which is Information Extraction from the biological research literature.
[Gaizauskas et al., 2003b]
|
PERSIVAL / Centrifuser
|
Centrifuser is the summarization engine [Kan et al., 2001] of the PERSIVAL (Personalized Retrieval and Summarization of Image, Video and Language) project: tailors search, presentation, and summarization of online medical literature and consumer health information to both the patient and healthcare provider.
[Elhadad et al, 2005; Elhadad &McKeown, 2001]
|
MiTAP
|
MITRE Text and Audio Processing. MiTAP monitors multiple information sources such as epidemiological reports, newswire feeds, email, online news, television and radio news in multiple languages in order to monitor infectious disease outbreaks or other biological threats.
[Damianos et al., 2002]
|
MUSI
|
Multilingual Summarization for the Internet. MUSI is a cross-lingual summarization system using articles from The Journal of Anesthesiology as input.
[Lenci et al., 2002]
|
TRESTLE
|
Text Retrieval Extraction and Summarization Technologies for Large Enterprises. TRESTLE produces single sentence summaries of Scrip pharmaceutical newsletters.
[Gaizauskas et al., 2001]
|
LASSIE
|
modeLing treAtment proceSSes using Information Extraction.
LASSIE is a methodology that uses Information Extraction to semi-automatically model treatment processes from clinical practice guidelines.
[Kaiser et al., 2006; Kaiser & Miksch, 2006]
|
EviX
|
Facilitating Evidence-based Decision Support Using Information Extraction and Clinical Guidelines.
EviX aims at applying Information Extraction methods to semi-automatically create computer-interpretable clinical guidelines and facilitating the execution of evidence-based recommendations.
[Kaiser et al., 2006; Kaiser &Miksch, 2006]
|
EMPathIE
|
EMPathIE aims to apply Information Extraction technology to bioinformatics tasks
such as the construction of an on-line searchable database from academic journal articles.
[Humphreys et al., 2000]
|
OntoGene
|
OntoGene focuses on the extraction of semantic relations between specific biological entities (such as Genes and Proteins) from the scientific literature (e.g., PubMed).
[Rinaldi et al., 2005]
|
PASTA
|
PASTA aims to create a database of protein active sites using text extraction methods.
[Humphreys et al., 2000; Gaizauskas et al., 2003a]
|
PertoMed
|
PertoMed applies NLP tools to corpora to develop the resources needed to build a medical ontology. An important task is to create textual corpora necessary for NLP tools.
[Baneyx et al., 2005]
|

|
| references: general |
Douglas E. Appelt.
Introduction to Information Extraction.
AI Communications, 12:161-172, 1999.
[]
[]
|
"
In recent years, analysts have been confronted with the increasing availability of on-line sources of information in the form of natural-language texts. This increased accessibility of textual information has led to a corresponding interest in technology for processing this text automatically to extract task-relevant information. This demand for a technological solution to the need to deal with the often-overwhelming quantity of available information has stimulated the development of the field of Information Extraction. This article provides an overview of the problems addressed, current approaches toward solutions, and assesses the state of the art and its potential for future progress.
"
|
Hamish Cunningham.
Information Extraction, Automatic.
In Keith Brown (ed.), Encyclopedia of Language and Linguistics, vol. 1-14, 2nd Edition, Elsevier Science Publishers, p.665-677, 2005.
[]
[]
|
"
This article describes information extraction (IE), the process of deriving disambiguated quantifiable data from natural language texts in service of some prespecified precise information need. The article covers the origins of IE and the factors relevant to its deployment in applications contexts, presents scenarios in which the technology has been applied, breaks down the task into five subtasks and defines them, and looks at recent developments in the field.
"
|
Ralph Grishman, Information Extraction: Techniques and Challenges.
In M.T. Pazienza (ed.), Information Extraction (International Summer School SCIE-97), Springer Verlag, 1997.
[]
[]
|
"
"
|
Wendy Lehnert, Claire Cardie, David Fisher, Joseph McCarthy, Ellen Riloff and Steven Soderland.
Evaluating an Information Extraction System. Journal of Integrated Computer-Aided Engineering, 1(6):453-472, 1994.
[]
[]
|
"
Many natural language researchers are now turning their attention to a relatively new task orientation known as information extraction. Information extraction systems are predicated on an I/O orientation that makes it possible to conduct formal evaluations and meaningful cross-system comparisons. This paper presents the challenge of information extraction and shows how information extraction systems are currently being evaluated. We describe a specific system developed at the University of Massachusetts, identify key research issues of general interest, and conclude with some observations about the role of performance evaluations as a stimulus for basic research.
"
|
Leonid Peshkin and Avi Pfeffer.
Bayesian Information Extraction Network.
In Georg Gottlob and Toby Walsh (eds.), Proceedings of the 18th International Joint Conference on Artificial Intelligence, p.421-426, Morgan Kaufman, 2003.
[]
[]
|
"
Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, we employ a DBN in an information extraction task. We show how to assemble wealth of emerging linguistic instruments for shallow parsing, syntactic and semantic tagging, morphological decomposition, named entity recognition etc. in order to incrementally build a robust information extraction system. Our method outperforms previously published results on an established benchmark domain.
"
|
Ellen Riloff.
Information Extraction as a Stepping Stone toward Story Understanding.
In A. Ram, K. Moorman (eds.) Understanding Language Understanding: Computational Models in Reading, The MIT Press, 1999.
[]
[]
|
"
"
|
Roman Yangarber. Scenario Customization for Information Extraction.
PhD thesis, New York University, New York, NY, January 2001.
[PubMed]
[]
|
"
Information Extraction (IE) is an emerging NLP technology, whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts, in the text, and to use these facts to fill a database. IE systems today are commonly based on pattern matching. The core IE engine uses a cascade of sets of patterns of increasing linguistic complexity. Each pattern consists of a regular expression and an associated mapping from syntactic to logical form. The pattern sets are customized for each new topic, as defined by the set of facts to be extracted.
Construction of a pattern base for a new topic is recognized as a time-consuming and expensive process - a principal roadblock to wider use of IE technology in the large. An effective pattern base must be precise and have wide coverage. This thesis addresses the portability problem in two stages. First, we introduce a set of tools for building patterns manually from examples. To adapt the IE system to a new subject domain quickly, the user chooses a set of example sentences from a training text, and specifies how each example maps to the extracted event - its logical form.
The system then applies meta-rules to transform the example automatically into a general set of patterns. This effectively shifts the portability bottleneck from building patterns to finding good examples. Second, we propose a novel methodology for discovering good examples automatically from a large un-annotated corpus of text. The system is initially seeded with a small set of good patterns given by the user. An incremental learning procedure then identifies new patterns and classes of related terms on successive iterations. We present experimental results, which confirm that the discovered patterns exhibit high quality, as measured in terms of precision and recall.
"
|
|
| References: Information Extraction in medicine |
Afantenos S, Karkaletsis V, Stamatopoulos P.
Summarization from medical documents: a survey.
Artif Intell Med. 2005 Feb;33(2):157-77.
[PubMed]
[]
|
"
Objective: The aim of this paper is to survey the recent work in medical documents summarization.
Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc.
Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics.
Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.
"
|
Audrey Baneyx, Jean Charlet, and Marie-Christine Jaulent.
Building Medical Ontologies Based on Terminology Extraction from Texts: Methodological Propositions.
In S. Miksch, J. Hunter, E. Keravnou (eds.) Proceedings of the 10th Conference on Artificial Intelligence in Medicine in Europe (AIME 2005), Aberdeen, Scotland, LNAI 3581, p. 231-235, 2005.
[]
[]
|
"
In the medical field, it is now established that the maintenance of unambiguous thesauri is accomplished by the building of ontologies. Our task in the PertoMed project is to help pneumologists code acts and diagnoses with a software that represents medical knowledge by an ontology of the concerned specialty. We apply natural language processing tools to corpora to develop the resources needed to build this ontology. In this paper, our objective is to develop a methodology for the knowledge engineer to build various types of medical ontologies based on terminology extraction from texts according to the differential semantics theory. Our main research hypothesis concerns the joint use of two methods: distributional analysis and recognition of semantic relationships by lexico-syntactic patterns. The expected result is the building of an ontology of pneumology.
"
|
Laurie Damianos, Jay Ponte, Steve Wohlever, Florence Reeder, David Day, George Wilson,
and Lynette Hirschman.
MiTAP for Bio-Security: A Case Study.
AI Magazine, 23(4):13-29, 2002.
[]
[]
|
"
MiTAP (MITRE Text and Audio Processing) is a prototype system available for monitoring infectious disease outbreaks and other global events. MiTAP focuses on providing timely, multi-lingual, global information access to medical experts and individuals involved in humanitarian assistance and relief work. Multiple information sources in multiple languages are automatically captured, filtered, translated, summarized, and categorized by disease, region, information source, person, and organization. Critical information is automatically extracted and tagged to facilitate browsing, searching, and sorting. The system supports shared situational awareness through collaboration, allowing users to submit other articles for processing, annotate existing documents, post directly to the system, and flag messages for others to see. MiTAP currently stores over one million articles and processes an additional 2000 to 10,000 daily, delivering up-to-date information to dozens of regular users.
"
|
Noemie Elhadad and Kathleen McKeown.
Towards generating patient specific summaries of medical articles.
In Proceedings of the Automatic Summarization Workshop (NAACL 2001), 2001.
[]
[]
|
"
The end users of medical digital libraries need quick access to information that is specific to the patients under their care. We present a summarization system that finds and extracts results from multiple medical journal articles returned by a search, filters results that match the patient and merges and orders the remaining facts for the summary. Our approach features an integration of text categorization, information extraction, information fusion and text reformulation for the summarization task.
"
|
Elhadad N, McKeown K, Kaufman D, Jordan D.
Facilitating physicians' access to information via tailored text summarization.
AMIA Annu Symp Proc. 2005;:226-30.
[PubMed]
[]
|
"
We have developed a summarization system, TAS (Technical Article Summarizer), which, when provided with a patient record and journal articles returned by a search, automatically generates a summary that is tailored to the patient characteristics. We hypothesize that a personalized summary will allow a physician to more quickly find information relevant to patient care. In this paper, we present a user study in which subjects carried out a task under three different conditions: using search results only, using a generic summary and search results, and using a personalized summary with search results. Our study demonstrates that subjects do a better job on task completion with the personalized summary, and show a higher level of satisfaction, than under other conditions.
"
|
Elhadad N, Kan MY, Klavans JL, McKeown KR.
Customization in a unified framework for summarizing medical literature.
Artif Intell Med. 2005 Feb;33(2):179-98.
[PubMed]
[]
|
"
OBJECTIVE: We present the summarization system in the PErsonalized Retrieval and Summarization of Images, Video and Language (PERSIVAL) medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries. METHODS AND MATERIAL: Our summarizer employs a unified user model to create a tailored summary of relevant documents for either a physician or lay person. The approach takes advantage of regularities in medical literature text structure and content to fulfill identified user needs. RESULTS: The resulting summaries combine both machine-generated text and extracted text that comes from multiple input documents. Customization includes both group-based modeling for two classes of users, physician and lay person, and individually driven models based on a patient record. CONCLUSIONS: Our research shows that customization is feasible in a medical digital library.
"
|
Robert Gaizauskas, Patrick Herring, Michael Oakes, Michelline Beaulieu,
Peter Willett, Helene Fowkes, and Anna Jonsson.
Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers.
In Proceedings of the Human Language Technology Conference (HLT 2001), 2001.
[]
[]
|
"
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access to text collections via a standard text browser. We describe how this information is used in a prototype system designed to support information workers' access to a pharmaceutical news archive as part of their "industry watch" function. We also report results of a preliminary, qualitative user evaluation of the system, which while broadly positive indicates further work needs to be done on the interface to make users aware of the increased potential of IE-enhanced text browsers.
"
|
Gaizauskas R, Demetriou G, Artymiuk PJ, Willett P.
Protein structures and information extraction from biological texts: the PASTA system.
Bioinformatics. 2003 Jan;19(1):135-43.
[PubMed]
[]
|
"
Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow.
Results: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific ar ticles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-ar t results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.
Availability: PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.
"
|
Robert Gaizauskas, Mark Hepple, Neil Davis, Yikun Guo, Henk Harkema, Angus Roberts,
and Ian Roberts.
AMBIT: Acquiring Medical and Biological Information from Text.
In S.J. Cox (ed.) Proceedings of the Second UK e-Science All Hands Meeting, Nottingham, UK, 2003.
[]
[]
|
"
We introduce and motivate the AMBIT system for extracting information from biomedical texts, which is currently under development at the University of Sheffield for use within two ongoing E-Science projects (myGrid and CLEF).
"
|
Henk Harkema, Andrea Setzer, Robert Gaizauskas, Mark Hepple, R. Power, and J. Rogers.
Mining and Modelling Temporal Clinical Data.
In Proceedings of the 4th UK e-Science All Hands Meeting, Nottingham, UK, 2005.
[]
[]
|
"
The Clinical e-Science Framework (CLEF) demonstrator runs Information Extraction technology
over textual, narrative patient notes to assemble repositories of clinical patient data
for the purposes of biomedical research and clinical care.
Since many important medical events in the course of a patient's treatment are mentioned
in multiple documents and most documents will only include partial descriptions of these
events, constructing a coherent and complete summary of a patient's history - what we call
a patient chronicle - requires an information integration step over the output of
Information Extraction. In this paper we describe and evaluate an approach to information integration which is based on mining narrative patient notes for temporal properties of medically relevant events and combining these with temporal information about events as provided by the structured (i.e., non-narrative) part of a patient's health record.
"
|
Henk Harkema, Ian Roberts, Robert Gaizauskas, and Mark Hepple.
Information Extraction from Clinical Records.
In S.J. Cox (ed.), Proceedings of the 4th UK e-Science All Hands Meeting, Nottingham, UK, 2005.
[]
[]
|
"
Much of the wealth of information that exists in patient clinical records is difficult or impractical to access, due to it being recorded in unstructured textual formats and the large volume of records available. To facilitate access to this information, we introduce AMBIT: a text analysis system designed to extract key information from clinical and biomedical text. Information derived in this way, and stored in a structured format, can be used in various ways to assist in the provision and development of clinical care. In this paper we discuss the architecture and functionality of AMBIT, and present evaluation results regarding its performance on an information extraction task in the medical domain.
"
|
Kevin Humphreys, George Demetriou, and Robert Gaizauskas.
Two Applications of Information Extraction to Biological Science Journal Articles:
Enzyme Interactions and Protein Structures.
In Proceedings of the Pacific Symposium on Biocomputations, p. 505-516, 2000.
[]
[]
|
"
Information extraction technology, as defined and developed through the U.S. DARPA
Message Understanding Conferences (MUCs), has proved successful at extracting information
primarily from newswire texts and primarily in domains concerned with human activity.
In this paper we consider the application of this technology to the extraction of information from scientific journal papers in the area of molecular biology. In particular, we describe how an information extraction system designed to participate in the MUC exercises has been modified for two bioinformatics applications: EMPathIE, concerned with enzyme and metabolic pathways; and PASTA, concerned with protein structure. Progress to date provides convincing grounds for believing that IE techniques will deliver novel and effective ways for scientists to make use of the core literature which defines their disciplines.
"
|
Katharina Kaiser, Cem Akkaya, and Silvia Miksch.
How Can Information Extraction Ease Formalizing Treatment Processes in Clinical Practice
Guidelines?
Artificial Intelligence in Medicine, to appear.
[]
[]
|
"
Objective. Formalizing clinical practice guidelines for a subsequent computer-supported processing is a challenging, but burdensome and time-consuming task. Existing methods and tools to support this task demand detailed medical knowledge, knowledge about the formal representations, and a manual modeling. Furthermore, formalized guideline documents mostly fall far short in terms of readability and understandability for the human domain modeler.
Methods and Material. We propose a new multi-step approach using information extraction methods to support the human modeler by both automating parts of the modeling process and making the modeling process traceable and comprehensible. This paper addresses the first steps to obtain a representation containing processes, which is independent of the final guideline representation language.
Results. We have developed and evaluated several heuristics without the need to apply Natural Language Understanding and implemented them in a framework to apply them to several guidelines from the medical subject of otolaryngology. Findings in the evaluation indicate that using semi-automatic, step-wise information extraction methods are a valuable instrument to formalize CPGs.
Conclusion. Our evaluation shows that a heuristic-based approach can achieve good results, especially for guidelines with a major portion of semi-structured text. It can be applied to guidelines irrespective to the final guideline representation format.
"
|
Katharina Kaiser and Silvia Miksch.
Modeling Treatment Processes Using Information Extraction.
In Lakhmi Jain (ed.) Computational Intelligence in Healthcare, to appear.
[]
[]
|
"
Clinical Practice Guidelines (CPGs) are important means to improve the quality of care by supporting medical staff. Modeling CPGs in a computer-interpretable form is a prerequisite for various computer applications to support their application. However, transforming guidelines in a formal guideline representation is a difficult task. Existing methods and tools demand detailed medical knowledge, knowledge about the formal representations, and a manual modeling.
In this chapter we introduce methods and tools for formalizing CPGs and we propose a methodology to reduce the human effort needed in the translation from original textual guidelines to formalized processable knowledge bases.
The idea of our methodology is to use Information Extraction methods to help in the semi-automation of guideline content formalization of treatment processes. Thereby, the human modeler will be supported by both automating parts of the modeling process and making the modeling process traceable and comprehensible.
Our methodology, called LASSIE, represents a novel method applying a stepwise procedure. The general idea is to use this method to formalize guidelines in any guideline representation language by applying both general steps (i.e., language- independent) and language-specific steps.
In order to evaluate both the methodology and the Information Extraction system, a framework was implemented and applied to several guidelines from the medical subject of otolaryngology. The framework has been applied to formalize the guidelines in the formal Asbru plan representation. Findings in the evaluation indicate that using semi-automatic, stepwise Information Extraction methods are a valuable instrument to formalize CPGs.
"
|
Min-Yen Kan and Kathleen McKeown and Judigh L. Klavans.
Applying Natural Language Generation to Indicative Summarization.
In Proceedings of the Eighth European Workshop on Natural Language Generation, ACL'01, 2001.
[PubMed]
[]
|
"
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemented content planner uses the topicality document feature to create indicative multidocument query-based summaries.
"
|
Michael Krauthammer and Goran Nenadic.
Term identification in the biomedical literature.
Biomedical Informatics, 37(6):512-526, 2004.
[PubMed]
[]
|
"
Sophisticated information technologies are needed for effective data acquisition and integration from a growing body of the biomedical literature. Successful term identification is key to getting access to the stored literature information, as it is the terms (and their relationships) that convey knowledge across scientific articles. Due to the complexities of a dynamically changing biomedical terminology, term identification has been recognized as the current bottleneck in text mining, and-as a consequence-has become an important research topic both in natural language processing and biomedical communities. This article overviews state-of-the-art approaches in term identification. The process of identifying terms is analyzed through three steps: term recognition, term classification, and term mapping. For each step, main approaches and general trends, along with the major problems, are discussed. By assessing previous work in context of the overall term identification process, the review also tries to delineate needs for future work in the field.
"
|
Alessandro Lenci, Roberto Bartolini, Nicoletta Calzolari, Ana Aguza, Stephan Busemann,
Emmanuel Cartier, Karine Chevrau, and José Coch.
Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project.
In Proceedings of the 3rd International Conference on Language Resources and Evaluation
(LREC'02), May 29-31, Las Palmas, Canary Islands, Spain, 2002.
[]
[]
|
"
In this paper we will illustrate the approach to multilingual automatic abstract production adopted by the EU-sponsored project MLIS MUSI. Although a small scale research project, MUSI has tried to tackle the challenges set by multilingual summarization by adopting an original approach based on the definition of a shared ontology and representation language, and on the reuse of existing linguistic resources. MUSI combines a statistic-based module for relevant sentence extraction and a concept-based component to generate multilingual summaries.
"
|
Fabio Rinaldi, Gerold Schneider, Kareel Kaljurand, Michael Hess, Christos Andronis, Andreas Persidis, and Ourania Konstanti. Relation Mining over a Corpus of Scientific literature. In S. Miksch, J. Hunter, E. Keravnou (eds.) Proceedings of the Conference on Artificial Intelligence in Medicine (AIME 2005), Aberdeen, Scotland, p. 535-544, 2005.
[]
[]
|
"The amount of new discoveries (as published in the scientific literature) in the area of Molecular Biology is currently growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and the extraction of the core information, for inclusion in one of the knowledge resources being maintained by the research community, becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results.
This paper presents and evaluates an approach aimed at automating the process of extracting semantic relations (e.g., interactions between genes and proteins) from scientific literature in the domain of Molecular Biology. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus.
"
|
|
| links |
|
|
|
| acknowledgements |
| Katharina Kaiser, Institute of Software Technology & Interactive Systems, Vienna University of Technology, Vienna, Austria. |
| page history |
Entry on OpenClinical: 25 August 2006
Last main update: 11 September 2006 |
|
|