A Parallel Document Engine Built on Top of a Cluster of Databases - Design, Implementation, and Experiences
|Title||A Parallel Document Engine Built on Top of a Cluster of Databases - Design, Implementation, and Experiences|
|Author(s)||T. Grabs, K. Böhm, H.-J. Schek|
|Organization||Department of Computer Science, ETH Zurich|
AbstractWe report on the implementation and evaluation of a document engine that supports many parallel search and concurrent insertion requests efficiently and that is scalable to growing numbers of such requests. We use a cluster of commodity database systems in a shared nothing architecture. We deploy previous results on multi-level transactions and decompose a service request into short parallel database transactions. A coordinator, implemented as an extension of a transaction processing monitor, routes the short transactions to the appropriate database system in the cluster, depending on the data distribution that we have chosen. We have paid much attention to the design and implementation of the coordinator to avoid that it becomes a bottleneck. That means that we implemented auxiliary functionality such as term extraction as services and distribute them over the cluster. Extensive experiments show the following: (1) A relatively small number of components already suffices to cope with high workloads. (2) The coordinator of the database cluster has minimal impact on CPU resource consumption and on response times. E.g., the response time overhead of the coordinator is in the order of milliseconds while the response time for retrieval and insertions remains within seconds even with 100 parallel search or insertion streams. This is rather unexpected since the coordinator performs signature-based predicate locking and writes additional logging information. We conclude that a database cluster with a coordinator on top is a good scalable infrastructure for complex application services.