Algorithm engineering for large data sets
Autoři
Více o knize
Massive data sets arise naturally in many domains: database and geographic information systems, telecommunication, enterprise software, Internet, and scientific computing. Recently, the development of I/O-efficient algorithms and data structures for large data sets has received considerable attention. However, much less has been done to evaluate their performance. We present the software library Stxxl that enables practice-oriented experimentation with huge data. Stxxl is an implementation of the C++ standard template library STL for external memory computations. It supports parallel disks, overlapping between I/O and computation and it is the first external memory algorithm library that supports the pipelining technique that can save many I/Os. We engineer practical I/O-efficient algorithms and their Stxxl implementations for a number of graph and text processing problems. The performance of the Stxxl is evaluated on many synthetic and real-world inputs. This book is written for students, researchers and software developers who want to learn how the interplay of hardware, software, and state-of-the-art algorithms helps to achieve high-performance processing of massive data.