IAMTC
is a multi-site NSF ITR project focusing on the annotation of
six sizable bilingual parallel corpora for interlingual content
with the goal of providing a significant data set for improving
knowledge-based approaches to machine translation (MT) and a
range of other Natural Language Processing (NLP) applications.
Data is being annotated in the following languages: Spanish,
French, Arabic, Japanese, Korean, and Hindi, along with multiple
English translations of source documents in all of these languages.
The project participants include the Computing
Research Laboratory at NMSU, the
Language Technologies Institute at CMU, the
Information Science Institute at USC, UMIACS
at the University of Maryland, the
MITRE Corporation, and Columbia
University.
The central goals
of the project are:
* to produce a practical, commonly-shared system
for representing the information conveyed by a text, or interlingua,
* to develop a methodology for accurately
and consistently assigning such representations to texts across
languages and across annotators,
* to annotate a sizable multilingual of parallel
corpus of source language texts and translations for IL content.
This corpus is expected to serve as a basis
for improving meaning-based approaches to MT and a range of
other natural language technologies.
The tools and annotation standards will serve
to facilitate more rapid annotation of texts in the future.
[Home]
[People] [Publications]
[Goals] [Interlingua]
[Tools] [Workplan]
[Results]