The goal of this research is to explore or develop models, knowledge bases,
linguistic corpora and tools, presentation metaphors, learning algorithms and
software to produce agents that provide a high level of interaction with
people. The research will develop practical software tools in forms that
allow their exploitation in applications (e.g. as COM components or as Web
services) both within the project and by others.
Filing and retrieving documents is a task where a higher level of interaction
through adaptive agents can be exploited. Web search engines have shown that
people find convenient to delegate the knowledge on where documents are stored
and to interact with the engine through simple natural language queries.
Nevertheless it is still the user responsibility to determine which documents
contain the exact information he needs and to extract it for completing his
task.
We wish to go beyond the ability to retrieve documents containing pertinent
information and support directly the user task. The TREC-8 Question Answering
Track has identified this need and states that: "Automatic question answering
will definitely be a significant advance in the state-of-art information
retrieval technology. Systems that can do reliable question answering without
domain restrictions have not been developed yet."
Ask Jeeves is an attempt in this direction. Using natural language processing
technology, Ask Jeeves determines both the meaning of the words in the
question (semantic processing) as well as the meaning in the grammar of the
question (syntactic processing). Ask Jeeves's answer-processing engine
provides several question template responses that contains links to the answer
locations. The user still has to extract the answer from the documents.
Our approach will be to work at the conceptual level. Linguistic analysis
tools and machine learning techniques will be applied to learn concepts from
documents and interactions with users. Identifying concepts and relations
among them in the documents will enable building knowledge bases suitable for
processing and answering questions.
The research will tackle the following issues:
- Concept learning. Extracting relevant concepts from training
collections, exploiting thesauri and ontologies.
- Identification of relations among concepts, creating or extending
thesauri and ontologies.
- Techniques to determine relevance and authoritativeness of sources
(link analysis, reinforcement learning and belief network techniques can be
used)
The techniques will be applied to the following tasks:
- Categorization. Supporting automated categorization by tools for
generating category profiles and concept matching. In particular we plan to
continue earlier work on search and categorization of documents, refining the
techniques with the addition of semantic linguistic analysis.
- Question answering. Identifying documents in a collection
containing pertinent information and answer directly (simple) questions issued
by users.