Advances in GPU Research and Practice

Book description

Advances in GPU Research and Practice focuses on research and practices in GPU based systems. The topics treated cover a range of issues, ranging from hardware and architectural issues, to high level issues, such as application systems, parallel programming, middleware, and power and energy issues.

Divided into six parts, this edited volume provides the latest research on GPU computing. Part I: Architectural Solutions focuses on the architectural topics that improve on performance of GPUs, Part II: System Software discusses OS, compilers, libraries, programming environment, languages, and paradigms that are proposed and analyzed to help and support GPU programmers. Part III: Power and Reliability Issues covers different aspects of energy, power, and reliability concerns in GPUs. Part IV: Performance Analysis illustrates mathematical and analytical techniques to predict different performance metrics in GPUs. Part V: Algorithms presents how to design efficient algorithms and analyze their complexity for GPUs. Part VI: Applications and Related Topics provides use cases and examples of how GPUs are used across many sectors.

  • Discusses how to maximize power and obtain peak reliability when designing, building, and using GPUs
  • Covers system software (OS, compilers), programming environments, languages, and paradigms proposed to help and support GPU programmers
  • Explains how to use mathematical and analytical techniques to predict different performance metrics in GPUs
  • Illustrates the design of efficient GPU algorithms in areas such as bioinformatics, complex systems, social networks, and cryptography
  • Provides applications and use case scenarios in several different verticals, including medicine, social sciences, image processing, and telecommunications

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. List of Contributors
  7. Preface
  8. Acknowledgments
  9. Part 1: Programming and Tools
    1. Chapter 1: Formal analysis techniques for reliable GPU programming: current solutions and call to action
      1. Abstract
      2. Acknowledgments
      3. 1 GPUs in Support of Parallel Computing
      4. 2 A quick introduction to GPUs
      5. 3 Correctness issues in GPU programming
      6. 4 The need for effective tools
      7. 5 Call to Action
    2. Chapter 2: SnuCL: A unified OpenCL framework for heterogeneous clusters
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 OpenCL
      5. 3 Overview of SnuCL framework
      6. 4 Memory management in SnuCL Cluster
      7. 5 SnuCL extensions to OpenCL
      8. 6 Performance evaluation
      9. 7 Conclusions
    3. Chapter 3: Thread communication and synchronization on massively parallel GPUs
      1. Abstract
      2. 1 Introduction
      3. 2 Coarse-Grained Communication and Synchronization
      4. 3 Built-In Atomic Functions on Regular Variables
      5. 4 Fine-Grained Communication and Synchronization
      6. 5 Conclusion and Future Research Direction
    4. Chapter 4: Software-level task scheduling on GPUs
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction, Problem Statement, and Context
      4. 2 Nondeterministic behaviors caused by the hardware
      5. 3 SM-centric transformation
      6. 4 Scheduling-enabled optimizations
      7. 5 Other scheduling work on GPUs
      8. 6 Conclusion and future work
    5. Chapter 5: Data placement on GPUs
      1. Abstract
      2. 1 Introduction
      3. 2 Overview
      4. 3 Memory specification through MSL
      5. 4 Compiler support
      6. 5 Runtime support
      7. 6 Results
      8. 7 Related work
      9. 8 Summary
  10. Part 2: Algorithms and Applications
    1. Chapter 6: Biological sequence analysis on GPU
      1. Abstract
      2. 1 Introduction
      3. 2 Pairwise Sequence Comparison and Sequence-Profile Comparison
      4. 3 Design aspects of GPU solutions for biological sequence analysis
      5. 4 GPU Solutions for Pairwise Sequence Comparison
      6. 5 GPU Solutions for Sequence-Profile Comparison
      7. 6 Conclusion and perspectives
    2. Chapter 7: Graph algorithms on GPUs
      1. Abstract
      2. 1 Graph representation for GPUs
      3. 2 Graph traversal algorithms: the breadth first search (BFS)
      4. 3 The single-source shortest path (SSSP) problem
      5. 4 The APSP problem
      6. 5 Load Balancing and Memory Accesses: Issues and Management Techniques
    3. Chapter 8: GPU alignment of two and three sequences
      1. Abstract
      2. 1 Introduction
      3. 2 GPU architecture
      4. 3 Pairwise alignment
      5. 4 Alignment of three sequences
      6. 5 Conclusion
    4. Chapter 9: Augmented Block Cimmino Distributed Algorithm for solving tridiagonal systems on GPU
      1. Abstract
      2. 1 Introduction
      3. 2 ABCD Solver for tridiagonal systems
      4. 3 GPU implementation and optimization
      5. 4 Performance evaluation
      6. 5 Conclusion and future work
    5. Chapter 10: GPU computing applied to linear and mixed-integer programming
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 Operations Research in Practice
      5. 3 Exact Optimization Algorithms
      6. 4 Metaheuristics
      7. 5 Conclusions
      8. Conflicts of Interest
    6. Chapter 11: GPU-accelerated shortest paths computations for planar graphs
      1. Abstract
      2. 1 Introduction
      3. 2 Related work
      4. 3 Partitioned Approaches
      5. 4 Computational Complexity Analysis
      6. 5 Experiments and results
      7. About the Authors
    7. Chapter 12: GPU sorting algorithms
      1. Abstract
      2. 1 Introduction
      3. 2 Generic Programming Strategies for GPU
      4. 3 Sorting algorithms
    8. Chapter 13: MPC: An effective floating-point compression algorithm for GPUs
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 Methodology
      5. 3 Experimental results
      6. 4 Summary and Conclusions
    9. Chapter 14: Adaptive sparse matrix representation for efficient matrix-vector multiplication
      1. Abstract
      2. 1 Introduction
      3. 2 Sparse matrix-vector multiplication
      4. 3 GPU architecture and programming model
      5. 4 Optimization principles for SpMV
      6. 5 Platform (Adaptive Runtime System)
      7. 6 Results and analysis
      8. 7 Summary
  11. Part 3: Architecture and Performance
    1. Chapter 15: A framework for accelerating bottlenecks in GPU execution with assist warps
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 Background
      5. 3 Motivation
      6. 4 The CABA Framework
      7. 5 A Case for CABA: Data Compression
      8. 6 Methodology
      9. 7 Results
      10. 8 Other Uses of the CABA Framework
      11. 9 Related Work
      12. 10 Conclusion
    2. Chapter 16: Accelerating GPU accelerators through neural algorithmic transformation
      1. Abstract
      2. 1 Introduction
      3. 2 Neural transformation for GPUs
      4. 3 Instruction-set-architecture design
      5. 4 Neural accelerator: design and integration
      6. 5 Controlling quality trade-offs
      7. 6 Evaluation
      8. 7 Related work
      9. 8 Conclusion
    3. Chapter 17: The need for heterogeneous network-on-chip architectures with GPGPUs: A case study with photonic interconnects
      1. Abstract
      2. 1 Introduction
      3. 2 Background
      4. 3 The Need for Heterogeneous Interconnections
      5. 4 Characterization of GPGPU Performance
      6. 5 Conclusion
    4. Chapter 18: Accurately modeling GPGPU frequency scaling with the CRISP performance model
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 Motivation and related work
      5. 3 GPGPU DVFS performance model
      6. 4 Methodology
      7. 5 Results
      8. 6 Conclusion
  12. Part 4: Power and Reliability
    1. Chapter 19: Energy and power considerations of GPUs
      1. Abstract
      2. 1 Introduction
      3. 2 Evaluation methodology
      4. 3 Power profiling of regular and irregular programs
      5. 4 Affecting power and energy on GPUs
      6. 5 Summary
      7. Appendix
      8. About the authors
    2. Chapter 20: Architecting the last-level cache for GPUs using STT-MRAM nonvolatile memory
      1. Abstract
      2. 1 Introduction
      3. 2 Background
      4. 3 Related Work
      5. 4 Two-Part L2 Cache Architecture
      6. 5 Dynamic Write Threshold Detection Mechanism
      7. 6 Implementation
      8. 7 Evaluation Result
      9. 8 Conclusion
    3. Chapter 21: Power management of mobile GPUs
      1. Abstract
      2. Acknowledgments
      3. 1 Introduction
      4. 2 GPU Power Management for Mobile Games
      5. 3 GPU Power Management for GPGPU Applications
      6. 4 Future Outlook
      7. 5 Conclusions
    4. Chapter 22: Advances in GPU reliability research
      1. Abstract
      2. 1 Introduction
      3. 2 Evaluating GPU Reliability
      4. 3 Hardware Reliability Enhancements
      5. 4 Software Reliability Enhancements
      6. 5 Summary
    5. Chapter 23: Addressing hardware reliability challenges in general-purpose GPUs
      1. Abstract
      2. 1 Introduction
      3. 2 GPGPUs Architecture
      4. 3 Modeling and Characterizing GPGPUs Reliability in the Presence of Soft Errors [25]
      5. 4 RISE: Improving the Streaming Processors’ Reliability Against Soft Errors in GPGPUs [36]
      6. 5 Mitigating the Susceptibility of GPGPUs to PVs [43]
  13. Author Index
  14. Subject Index

Product information

  • Title: Advances in GPU Research and Practice
  • Author(s): Hamid Sarbazi-Azad
  • Release date: September 2016
  • Publisher(s): Morgan Kaufmann
  • ISBN: 9780128037881