HPC Enables Knowledge Mining
A new research project, called UIMA-HPC, has been initiated to enable data mining applications to make efficient use of high-performance computing resources. In the first phase, the focus will be on the bio-pharmacological area for which e.g. the PubMed database holds more than 20 million entries. Researchers in this field need to find answers to questions such as the following: For a given base structure, are there any structure variants already mentioned in literature, and if so, are there any indications of their effects? Are structure variants protected by third-party rights or are they freely available? These questions cannot be answered by sheer keyword searches. The information has to be made available to researchers in a compact and structured way in a timely manner.
The project has been set up to develop fast and efficient procedures to extract knowledge from unstructured data from all kinds of sources, such as texts, graphics, tables, diagrams, captions, and blogs. The project partners FHG SCAI, JSC, Taros Chemicals, and scapos will develop a system which embeds the de facto standard protocol for information extraction UIMA (Unstructured Information Management Architecture) into an HPC framework based on UNICORE, thus enabling a new class of applications. UIMA-HPC is funded in part by BMBF and started on 1 April with a duration of three years.
(Contact: Mathilde Romberg)
from JSC News No. 194, April 2011