Tanl Linguistic Pipeline |
IXE is a modern object-oriented framework for developing applications that gather, analyze and query collections of documents.
IXE consists in a library of C++ class that provides the components for building sophisticated text handling application, ranging from Information Retrieval, Web Search Engines to Question Answering.
The library is built with a modular design that uses fundamental principles of Object-Oriented Programming as well as software design patterns [gamma]. Generative programming techniques are adopted both with extensive use of templates for parameterising classes and with the use of template metaprogramming to get the C++ compiler to produce specific code for each class. Self Registering of classes allows composing applications just by linking their binaries together.
The library has been designed for efficiency and scalability. The library can handle multiple collections of documents with overall size of several Terabytes. Using the library several kinds of applications can be built, ranging from Web Search Engines, local disk indexing and retrieval, document management systems, specialized indexing and search of multimedia collection.
The library includes several document readers for parsing and indexing text documents in various formats (HTML, PDF, DOC).
An example of a full Web Search Engine is described in Web Search Engine Example.