Undergraduate Topics in Computer Science: Introduction to HPC with MPI for Data Science
- 315 stránek
- 12 hodin čtení
This gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard serves as a foundational course for undergraduates on parallel programming within distributed memory models, requiring only basic programming knowledge. The book is divided into two parts. The first part focuses on high performance computing using C++ and MPI, covering essential concepts such as blocking versus non-blocking communications, global communications (e.g., broadcast, scatter), and collaborative computations (reduce). It also discusses Amdahl's and Gustafson's speed-up laws, parallel sorting, and linear algebra on clusters. Various cluster topologies, including ring, torus, and hypercube, are explained, along with global communication procedures. The section concludes with the MapReduce model, ideal for big data processing within the MPI framework. The second part shifts to high-performance data analytics, introducing flat and hierarchical clustering algorithms for data exploration, programming these algorithms on clusters, machine learning classification, and an introduction to graph analytics. It wraps up with a brief overview of data core-sets, making big data problems manageable. Each chapter includes exercises for practice, and a final exam helps students assess their understanding of the material.
