Gain hands-on experience with HDF5 for storing scientific data in Python. This practical guide quickly gets you up to speed on the details, best practices, and pitfalls of using HDF5 to archive and share numerical datasets ranging in size from gigabytes to terabytes.
Through real-world examples and practical exercises, you’ll explore topics such as scientific datasets, hierarchically organized groups, user-defined metadata, and interoperable files. Examples are applicable for users of both Python 2 and Python 3. If you’re familiar with the basics of Python data analysis, this is an ideal introduction to HDF5.
Get set up with HDF5 tools and create your first HDF5 file
Work with datasets by learning the HDF5 Dataset object
Understand advanced features like dataset chunking and compression
Learn how to work with HDF5’s hierarchical structure, using groups
Create self-describing files by adding metadata with HDF5 attributes
Take advantage of HDF5’s type system to create interoperable files
Express relationships among data with references, named types, and dimension scales
Discover how Python mechanisms for writing parallel code interact with HDF5
Chapter 1 Introduction
Python and HDF5
What Exactly Is HDF5?
Chapter 2 Getting Started
The HDF5 Tools
Your First HDF5 File
Chapter 3 Working with Datasets
Reading and Writing Data
Chapter 4 How Chunking and Compression Can Help You
Setting the Chunk Shape
Performance Example: Resizable Datasets
Filters and Compression
Chapter 5 Groups, Links, and Iteration: The "H" in HDF5
The Root Group and Subgroups
Working with Links
Iteration and Containership
Multilevel Iteration with the Visitor Pattern
Object Comparison and Hashing
Chapter 6 Storing Metadata with Attributes
Real-World Example: Accelerator Particle Database
Chapter 7 More About Types
The HDF5 Type System
Integers and Floats
The array Type
Dates and Times
Chapter 8 Organizing Data with References, Types, and Dimension Scales
Chapter 9 Concurrency: Parallel HDF5, Threading, and Multiprocessing
Andrew Collette holds a Ph.D. in physics from UCLA, and works as a laboratory research scientist at the University of Colorado. He has worked with the Python-NumPy-HDF5 stack at two multimillion-dollar research facilities; the first being the Large Plasma Device at UCLA (entirely standardized on HDF5), and the second being the hypervelocity dust accelerator at the Colorado Center for Lunar Dust and Atmospheric Studies, University of Colorado at Boulder. Additionally, Dr. Collette is a leading developer of the HDF5 for Python (h5py) project.
The animals on the cover of Python and HDF5 are Parrot Crossbills (Loxia pytyopsittacus). Rather than being related to parrots in anyway, the Parrot Crossbill is actually a species of finch that lives in northwestern Europe and western Russia. There is also a small population in Scotland, where it is difficult to distinguish the Parrot from the related Red and Scottish Crossbills.
The Parrot Crossbill’s name comes from the fact that the upper mandible overlaps the lower one, giving it the same shape as many parrots’ beaks. This adaptation makes it easy for the birds to extract seeds from conifer cones, which are their main source of food. In Scotland, they are specialist feeders on the cones of the Scots pine.
It is very difficult to tell Parrot Crossbills apart from the other species of Loxia, but there are a few clues. Parrot Crossbills are slightly bigger, have the curved beak, and have a deeper call than the others. They also tend to have a bigger head. All three species share the same territory and breeding range; the males are reddish orange in color, while the females are olive green or gray.
On average, a female will have a clutch of three or four eggs, which she incubates for about two weeks. Once the chicks have hatched, they live in the nest for about a month before starting out on their own. Due to its large geographic range and stable population numbers, the Parrot Crossbill is not considered endangered or threatened in any way.
A few errors but overall a great book for HDF5 in Python
About Me Developer
Easy to understand
Comments about oreilly Python and HDF5:
This guide to HDF5 manipulation - via the Python h5py library - is written by Dr Andrew Collette, a physics laboratory research scientist who is also the leading developer of h5py (one of the 2 main Python libraries that specialize in HDF5). He puts his own extensive background of using h5py, as well as his hands-on experience in behind-the-scenes development of h5py, to full use in writing this guide that introduces HDF5, from basic file construction and data storage, all the way till advanced topics like using parallel computing in HDF5 file manipulation. Familiarity with Python and the numpy Python library are assumed, especially data types (dtype) and matrix manipulation, but thankfully, you don't need to be a numpy wizard to follow through the examples.
What I like about this book is that it is readable (with little/no excessive jargon), concise, and easy to follow - although it is only 152 pages, all the topics are cleanly structured and comprehensively covered with lots of examples; there is minimal fluff or filler material - to me, this is the ideal technical book: simple, and to the point. If you are wondering whether ploughing through this book is worth it, I can tell you upfront that it is definitely worth a look, especially given the limited information available on the h5py webpage as well as information found through Googling. I especially like the sections on types and references, as well as the best practices he highlighted, especially in terms of data retrieval and writing (I don't use parallel computing so did not delve into the last concurrency chapter, nor the section on dimension scales).
My main grouse, however, is that one of the array compound type described in the book is buggy - it simply didn't work (I'm using h5py 2.2.1), and upon Googling, found that it is an acknowledged bug. Another grouse (albeit a minor one), is that the title is a bit misleading, it should be called 'Python h5py and HDF5' or something, since Pytables (the other main Python library dealing with HDF5) isn't covered at all.
Overall, this book is worth checking out, especially given the conciseness, clarity of writing and the comprehensive treatment of HDF5 file manipulation. Good stuff! :)
Bottom Line Yes, I would recommend this to a friend