IAMTC is a multi-site NSF ITR project focusing on the annotation of six sizable bilingual parallel corpora for interlingual content with the goal of providing a significant data set for improving knowledge-based approaches to machine translation (MT) and a range of other Natural Language Processing (NLP) applications. Data is being annotated in the following languages: Spanish, French, Arabic, Japanese, Korean, and Hindi, along with multiple English translations of source documents in all of these languages.

The project participants include the Computing Research Laboratory at NMSU, the Language Technologies Institute at CMU, the Information Science Institute at USC, UMIACS at the University of Maryland, the MITRE Corporation, and Columbia University.

The central goals of the project are:
* to produce a practical, commonly-shared system for representing the information conveyed by a text, or interlingua,

* to develop a methodology for accurately and consistently assigning such representations to texts across languages and across annotators,

* to annotate a sizable multilingual of parallel corpus of source language texts and translations for IL content.

This corpus is expected to serve as a basis for improving meaning-based approaches to MT and a range of other natural language technologies.

The tools and annotation standards will serve to facilitate more rapid annotation of texts in the future.

[Home] [People] [Publications] [Goals] [Interlingua] [Tools] [Workplan] [Results]