HDF5 for storage of measurement data in condition monitoring

Introduction

Condition monitoring of a machine or some of its components involves a complex process of data collection and processing from several sensors, usually at high sampling rates. An efficient architecture for a data collection system relies, among other important factors, on a suitable file format to store the sensors’ measurements.  Said measurements need not only to be stored, but also labelled and structured so they can be retrieved efficiently during the processing stage. 

What is HDF5 format?

The HDF5 (Hierarchical Data Format 5) format is a storage format for large and complex datasets. It was developed by the National Center for Supercomputing Applications (NCSA) and is used by scientists and engineers in a wide range of fields.

HDF5 is designed to store large amounts of structured and unstructured data, such as images, audio, video, text, simulation results, and other types of scientific data. The format provides a hierarchical structure in which data can be stored as datasets, groups, and attributes.

 

Advantages of HDF5

Using HDF5 to store experimental data has a number of advantages. Regarding data storage, heterogeneous data can be contained in the same file. Besides, HDF5 is a compressed format, allowing to save a large amount of data from a complex experiment in the same file of reasonable size. 

With respect to data access, HDF5 supports data slicing. This means that one can access a subset of the data stored in the file without loading its entire content into the computer’s RAM memory, which is crucial when working with large datasets. 

Another key feature is self-description. HDF5 allows to assign metadata to different subsets of the file’s contents, giving great flexibility when annotating the data. Additionally, this facilitates retrieving information from the metadata in an automatic fashion. 

Finally, HDF5 is an open standard, and it is supported by several programming languages and tools such as C, Python, R, MATLAB, LabView, and more. This allows for the experimental data contained in an HDF5 file to be used by scientists and engineers using different computer platforms and software. 

Disadvantages

No file format is perfect, and HDF5 is no exception. Because of the complexity of its specification, there is only one fully developed implementation, which might create problems if this implementation deviates from the HDF5 specification. Besides, every user and developer must deal with the bugs that appear from time to time.  

Another issue is that HDF5 is a binary format, and thus not human readable. It needs to be explored with a viewer program, such as HDF5View, or through a user-made program, but the use of standard tools in Unix and Windows to explore files is not possible.  

Nevertheless, the HDF5 format has been widely used in scientific applications for a couple of decades, which shows that its robustness and efficiency outweigh its shortcomings. 

Relevance to condition monitoring

As we discussed in the introduction, condition monitoring involves the collection, storage and analysis of machine data using a wide array of sensors, such as accelerometers, microphones, strain gauges, load cells, and more. This is done often at high sampling rates, particularly when vibration analysis is necessary, for example when monitoring elements such as rolling bearings and gears.  

The monitoring process results in large volumes of numerical data from heterogeneous sensors, possibly from different manufacturers, that often need to be accessed remotely in order to detect faults, assess the state of the machine component, and estimate its remaining useful lifetime. The capabilities of HDF5 that we reviewed in this article fit these needs very well, and thus this file format is a very strong candidate for a universal way of storing measurement data. 

More information?

Would you like to know more about the HDF5 format and its use in condition monitoring? Contact us.