Gain hands-on experience with HDF5 for storing scientific data in Python. This practical guide quickly gets you up to speed on the details, best practices, and pitfalls of using HDF5 to archive and share numerical datasets ranging in size from gigabytes to terabytes.
Through real-world examples and practical exercises, you’ll explore topics such as scientific datasets, hierarchically organized groups, user-defined metadata, and interoperable files. Examples are applicable for users of both Python 2 and Python 3. If you’re familiar with the basics of Python data analysis, this is an ideal introduction to HDF5.
Get set up with HDF5 tools and create your first HDF5 file
Work with datasets by learning the HDF5 Dataset object
Understand advanced features like dataset chunking and compression
Learn how to work with HDF5’s hierarchical structure, using groups
Create self-describing files by adding metadata with HDF5 attributes
Take advantage of HDF5’s type system to create interoperable files
Express relationships among data with references, named types, and dimension scales
Discover how Python mechanisms for writing parallel code interact with HDF5
Chapter 1 Introduction
Python and HDF5
What Exactly Is HDF5?
Chapter 2 Getting Started
The HDF5 Tools
Your First HDF5 File
Chapter 3 Working with Datasets
Reading and Writing Data
Chapter 4 How Chunking and Compression Can Help You
Setting the Chunk Shape
Performance Example: Resizable Datasets
Filters and Compression
Chapter 5 Groups, Links, and Iteration: The "H" in HDF5
The Root Group and Subgroups
Working with Links
Iteration and Containership
Multilevel Iteration with the Visitor Pattern
Object Comparison and Hashing
Chapter 6 Storing Metadata with Attributes
Real-World Example: Accelerator Particle Database
Chapter 7 More About Types
The HDF5 Type System
Integers and Floats
The array Type
Dates and Times
Chapter 8 Organizing Data with References, Types, and Dimension Scales
Chapter 9 Concurrency: Parallel HDF5, Threading, and Multiprocessing
Andrew Collette holds a Ph.D. in physics from UCLA, and works as a laboratory research scientist at the University of Colorado. He has worked with the Python-NumPy-HDF5 stack at two multimillion-dollar research facilities; the first being the Large Plasma Device at UCLA (entirely standardized on HDF5), and the second being the hypervelocity dust accelerator at the Colorado Center for Lunar Dust and Atmospheric Studies, University of Colorado at Boulder. Additionally, Dr. Collette is a leading developer of the HDF5 for Python (h5py) project.
The animals on the cover of Python and HDF5 are Parrot Crossbills (Loxia pytyopsittacus). Rather than being related to parrots in anyway, the Parrot Crossbill is actually a species of finch that lives in northwestern Europe and western Russia. There is also a small population in Scotland, where it is difficult to distinguish the Parrot from the related Red and Scottish Crossbills. The Parrot Crossbill’s name comes from the fact that the upper mandible overlaps the lower one, giving it the same shape as many parrots’ beaks. This adaptation makes it easy for the birds to extract seeds from conifer cones, which are their main source of food. In Scotland, they are specialist feeders on the cones of the Scots pine. It is very difficult to tell Parrot Crossbills apart from the other species of Loxia, but there are a few clues. Parrot Crossbills are slightly bigger, have the curved beak, and have a deeper call than the others. They also tend to have a bigger head. All three species share the same territory and breeding range; the males are reddish orange in color, while the females are olive green or gray. On average, a female will have a clutch of three or four eggs, which she incubates for about two weeks. Once the chicks have hatched, they live in the nest for about a month before starting out on their own. Due to its large geographic range and stable population numbers, the Parrot Crossbill is not considered endangered or threatened in any way.