I/O Architectures
GSWHC-B Getting Started with HPC Clusters K1.3-B I/O Architectures
Relevant for: Tester, Builder, and Developer
Description:
- You will learn about I/O architectures used in HPC environments: local, distributed, parallel and hierarchical file systems (basic level)
This skill requires no sub-skills
Level: basic
I/O Architectures
Before starting to use an HPC system it is important to basically understand file system architectures used in clusters:
Local file systems can be provided on nodes that are equipped with disks. Only the node itself can access its disks. Local file systems are useful for (heavy) scratch I/O. The advantage of local disks is that I/O performance scales perfectly with the number of nodes. The disadvantage is that for an overall view data has to be collected from all the nodes it is stored on. Because disks are a major source of failures many clusters have diskless nodes.
Distributed (or network) file systems are provided by a file server which is integrated into the cluster. These file systems are available on all nodes. A typical example is the /home file system. While all nodes can read concurrently from a distributed file system without interference, care must be taken in concurrent writing. In general, a file should only be written or modified by a single process at a time. A classic distributed file system is the Network File System (NFS). Network file systems are not designed for (very) high I/O loads.
Parallel (or cluster) file systems are global file systems like distributed file systems, i.e. they can be used on any node in the cluster. They are designed to deliver high I/O bandwidth and provide large disk space. The parallel or cluster aspect is twofold. Firstly, the hardware is parallel itself (the file system is provided by several servers that operate in a coordinated way). Secondly, parallel I/O is enabled, i.e. more than one process can consistently write to the same file at the same time. The names (mountpoints) of theses file systems vary. There is no standard name for them like /home, although the /home file system can be put on a parallel files system.
Parallel file systems fit well to the classic HPC scenario, i.e. large computer simulations that perform I/O only every one in a while (a few minutes per hour of computation, say). Limitations of I/O performance become noticeable if too many processes are doing I/O and also if very many small files are used (using many small files is a traditional approach under Unix, but this concept becomes a bottleneck if too many processes employ it on a parallel file system). Examples of parallel file systems are Lustre, IBM Spectrum Scale and BeeGFS.
A file system with hierarchical storage management (HSM) has at least two tiers of different storage media. Typically, there are two tiers: a faster, smaller, more expansive one and a slower, larger, less expensive one. Data is moved automatically between the two tiers. While this is elegant, care should be taken when such a system is heavily used, i.e. it might be necessary to stage data to the fast part or unstage to the slow part manually. At present, probably all HSM systems have spinning disk. Disks can be the slow part if the fast part consists of SSDs, or the fast part if the slow part consists of tapes. With tapes in the background it is important not to generate too many files because it can take a long time to bring them back from tape to disk.
All the file systems mentioned above are mounted to (some or all) nodes of the cluster. They can be used via the file handling commands of the operating system. For completeness external storage is mentioned, which needs special (remote copy) commands. An example are object stores.