Status and Future Work
The PeCoH project started on March 1, 2017 and has a duration of three years (one PhD candidate and a postdoctoral full time position). For processing the work packages three main topics can be emphasized for scientific studies:
Establishing Software Engineering for HPC
Our observations show that parallel programs are often developed in the same way as decades ago. One obvious reason is that in the scientific computing community the focus lies on publishing scientific results and not on the software that was used to obtain these results. In the past the software engineering community did not work effectively enough in the field of HPC. Therefore in the PeCoH project we want to measure the positive impact of using software engineering on scientific productivity, e.g.
- Efficient Algorithms and Data Structures
- Object Oriented Development
- Agile Software Development, Automated Testing / Test-Driven Development
- Coding Guidelines, Refactoring
- Version and Configuration Management
Development of a Cost Model
Our cost model will be based mainly on
- resource usage of the batch jobs
- time spent in different tasks to rewrite existing code on pilot studies applying performance engineering concepts
With the subsequent cost analysis we can estimate the cost-benefit ratio of novel approaches. Thus we can raise the awareness and knowledge of users for performance engineering, i.e., assist in identification and quantification of potential efficiency improvements in scientific parallel codes and parallel code usage.
Transfer of HPC Know-how and Providing Tuning Principles
According to our experience problems of the scientists are often dominated at the level of getting things to work, i.e. the user focuses on getting a parallel job to run, rather than being aware of how to use the expensive HPC resources appropriately. Hamburg HPC Competence Center (HHCC), as a virtual institution, takes this into account and provides basic and advanced HPC knowledge to improve the situation via online content e.g. for
- Basic Competences: Linux command line, shell scripts, robust job scripts, job chaining
- System Building: compilers, optimization flags, linker, libraries, debugger, code optimization
- Hardware Architectures: shared memory systems, distributed systems, hybrid systems
- I/O Operations: storage systems, choice of file sizes vs. number of files, compression
- Detecting Performance Issues: measuring speedups (compared to best known sequential algorithm) and efficiencies, monitoring resource utilization on the application level