UNCLASSIFIED
UNCLASSIFIED
R-1 Line Item #11
Page 30 of 33
Exhibit R-2a, PB2010Defense Advanced Research Projects Agency RDT&E Project Justification DATE: May 2009
APPROPRIATION/BUDGET ACTIVITY
0400 - Research, Development, Test & Evaluation, Defense-Wide/BA
2 - Applied Research
R-1 ITEM NOMENCLATURE
PE 0602303EINFORMATION & COMMUNICATIONS
TECHNOLOGY
PROJECT NUMBER
IT-04
B. Accomplishments/Planned Program ($ in Millions)
FY 2008 FY 2009 FY 2010 FY 2011
- Complete the architecture for a summarization system that incorporates adaptive filtering, focused
summarization, information extraction, contradiction detection, and user modeling.
- Develop methods for using extraction-empowered machine translation, where the system extracts
the meaningful phrases (e.g., names and descriptions) from foreign language text for highly accurate
translation into English.
- Continue to transition technologies developed by the GALE program into high-impact military systems
and intelligence operations centers.
- Exercise language independent paradigm for new languages essential for military use - Dari, Pashto
and Urdu.
Multilingual Automatic Document Classification, Analysis and Translation (MADCAT)
(U) The Multilingual Automatic Document Classification, Analysis and Translation (MADCAT) program
will develop and integrate technology to enable exploitation of captured, foreign language, hard-copy
documents. This technology is crucial to the warfighter, as hard-copy documents including notebooks,
letters, ledgers, annotated maps, newspapers, newsletters, leaflets, pictures of graffiti, and document
images (e.g., PDF files, JPEG files, scanned TIFF images, etc.) resident on magnetic and optical media
captured in the field may contain important, but perishable information. Unfortunately, due to limited
human resources and the immature state of applicable technology, the Services lack the ability to exploit,
in a timely fashion, ideographic and script documents that are either machine printed or handwritten
in Arabic. The MADCAT program will address this need by producing devices that will convert such
captured documents to readable English in the field. MADCAT will substantially improve the applicable
technologies, in particular document analysis and optical character recognition/optical handwriting
recognition (OCR/OHR). MADCAT will then tightly integrate these improved technologies with translation
technology and create demonstration prototypes for field trials.
FY 2008 Accomplishments:
- Improved methods for document segmentation (e.g., title, address box, columns, lists, embedded
picture/diagram/caption, annotation, signature block, etc.).
- Improved script (e.g., Roman vs. Cyrillic) and language (e.g., Farsi vs. Arabic) identification.
8.131 12.414 16.222