Python and HDF5
Unlocking Scientific Data
Publisher: O'Reilly Media
Final Release Date: October 2013
Pages: 152

Gain hands-on experience with HDF5 for storing scientific data in Python. This practical guide quickly gets you up to speed on the details, best practices, and pitfalls of using HDF5 to archive and share numerical datasets ranging in size from gigabytes to terabytes.

Through real-world examples and practical exercises, you’ll explore topics such as scientific datasets, hierarchically organized groups, user-defined metadata, and interoperable files. Examples are applicable for users of both Python 2 and Python 3. If you’re familiar with the basics of Python data analysis, this is an ideal introduction to HDF5.

  • Get set up with HDF5 tools and create your first HDF5 file
  • Work with datasets by learning the HDF5 Dataset object
  • Understand advanced features like dataset chunking and compression
  • Learn how to work with HDF5’s hierarchical structure, using groups
  • Create self-describing files by adding metadata with HDF5 attributes
  • Take advantage of HDF5’s type system to create interoperable files
  • Express relationships among data with references, named types, and dimension scales
  • Discover how Python mechanisms for writing parallel code interact with HDF5
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews


by PowerReviews
oreillyPython and HDF5

(based on 1 review)

Ratings Distribution

  • 5 Stars



  • 4 Stars



  • 3 Stars



  • 2 Stars



  • 1 Stars



Reviewed by 1 customer

Displaying review 1

Back to top

(1 of 1 customers found this review helpful)


A few errors but overall a great book for HDF5 in Python

By Onn

from Sydney

About Me Developer

Verified Reviewer


  • Easy to understand
  • Helpful examples
  • Well-written


    Best Uses

    • Intermediate
    • Novice

    Comments about oreilly Python and HDF5:

    This guide to HDF5 manipulation - via the Python h5py library - is written by Dr Andrew Collette, a physics laboratory research scientist who is also the leading developer of h5py (one of the 2 main Python libraries that specialize in HDF5). He puts his own extensive background of using h5py, as well as his hands-on experience in behind-the-scenes development of h5py, to full use in writing this guide that introduces HDF5, from basic file construction and data storage, all the way till advanced topics like using parallel computing in HDF5 file manipulation. Familiarity with Python and the numpy Python library are assumed, especially data types (dtype) and matrix manipulation, but thankfully, you don't need to be a numpy wizard to follow through the examples.

    What I like about this book is that it is readable (with little/no excessive jargon), concise, and easy to follow - although it is only 152 pages, all the topics are cleanly structured and comprehensively covered with lots of examples; there is minimal fluff or filler material - to me, this is the ideal technical book: simple, and to the point. If you are wondering whether ploughing through this book is worth it, I can tell you upfront that it is definitely worth a look, especially given the limited information available on the h5py webpage as well as information found through Googling. I especially like the sections on types and references, as well as the best practices he highlighted, especially in terms of data retrieval and writing (I don't use parallel computing so did not delve into the last concurrency chapter, nor the section on dimension scales).

    My main grouse, however, is that one of the array compound type described in the book is buggy - it simply didn't work (I'm using h5py 2.2.1), and upon Googling, found that it is an acknowledged bug. Another grouse (albeit a minor one), is that the title is a bit misleading, it should be called 'Python h5py and HDF5' or something, since Pytables (the other main Python library dealing with HDF5) isn't covered at all.

    Overall, this book is worth checking out, especially given the conciseness, clarity of writing and the comprehensive treatment of HDF5 file manipulation. Good stuff! :)

    Displaying review 1

    Back to top

    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Immediate Access - Go Digital what's this?
    Ebook:  $24.99
    Formats:  DAISY, ePub, Mobi, PDF
    Print & Ebook:  $32.99
    Print:  $29.99