PowerDBIR: Information Retrieval on Top of a Database Cluster

Title PowerDBIR: Information Retrieval on Top of a Database Cluster
Author(s) T. Grabs, K. Böhm, H.-J. Schek
Type inproceedings
Booktitle Proceedings of ACM CIKM 2001 -- Tenth International Conference on Information and Knowledge Management
Atlanta, GA, USA
Month November
Year 2001


Our current concern is a scalable infrastructure for information re-trieval (IR) with up-to-date retrieval results in the presence of fre-quent, continuous updates. Timely processing of updates is impor-tant with novel application domains, e.g., e-commerce. We want to use off-the-self hardware and software as much as possible. These issues are challenging, given the additional requirement that the resulting system must scale well. We have built PowerDB-IR, a system that has the characteristics sought. This paper describes its design, implementation, and evaluation. PowerDB-IR is a coordi-nation layer for a database cluster. The rationale behind a database cluster is to tscale-outs, i.e., to add further cluster nodes, whenever necessary for better performance. We build on IR-to-database map-pings and service decomposition to support high-level parallelism. We follow a three-tier architecture with the database cluster as the bottom layer for storage management. The middle tier provides IR-specific processing and update services. PowerDB-IR has the following features: It allows to insert and retrieve documents con-currently, and it ensures freshness with almost no overhead. Alter-native physical data organization schemes provide adequate perfor-mance for different workloads. Query processing techniques for the different data organizations efficiently integrate the ranked retrieval results from the cluster nodes. We have run extensive experiments with our prototype using commercial database systems and middle-ware software products. The main result is that PowerDB-IR shows surprisingly ideal scalability and low response times.

You can directly download a PDF (93 KB) version of this paper.
!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!