Using Shell Scripts
GSWHC-B Getting Started with HPC Clusters \(\rightarrow\) USE1-B Use of the Cluster Operating System \(\rightarrow\) USE1.2-B Using Shell Scripts
Relevant for: Tester, Builder, and Developer
Description:
- You will learn to handle scripting tasks that are often needed in batch scripts: manipulating filenames, temporary files, tracing command execution, error handling, trivial parallelization (basic level)
This skill requires the following sub-skill
- USE1.1-B Use of the Command Line Interface (\(\leftarrow\) USE1-B Use of the Cluster Operating System)
Level: basic
Using shell scripts
Shell scripts are used to store complicated commands or to automate tasks. A simple script consists of one or a few commands that appear exactly like on the interactive command line. Since the shell is also a programming language, scripts can execute more complicated processes. On HPC systems scripting is needed for creating batch jobs. Here, some scripting tasks are explained that are useful for writing batch scripts.
Manipulating filenames
Handling filenames translates to character string processing. The following table shows some typical examples:
action | command | result |
---|---|---|
initialization | a=foo |
a=foo |
b=bar |
b=bar |
|
concatenation | c=$a/$b.c |
c=foo/bar.c |
d=${a}_$b.c |
d=foo_bar.c |
|
get directory | dir=$(dirname $c) |
dir=foo |
get filename | file=$(basename $c) |
file=bar.c |
remove suffix | name=$(basename $c .c) |
name=bar |
name=${file%.c} |
name=bar |
|
remove prefix | ext=${file##*.} |
ext=c |
Recommendation: Never use white space in filenames! This is error prone, because quoting becomes necessary, like in: dir=$(dirname "$c")
.
Temporary files
There are three issues with temporary files: choice of the directory to write them, unique names and automatic deletion.
Assume that a batch job shall work in a temporary directory. Possibly, the computing center provides such a directory for every batch job and deletes that directory when the jobs ends. Then one can just use that directory. If this is not the case one can proceed like this.
- Choose a file system (or top directory) to work in. The classic directory for this purpose is /
tmp
. However, this is not a good choice on (almost all) HPC clusters, because/tmp
is probably too small on diskless nodes. You should find out for your system, which file system is well suited. There can be local file systems (on nodes that are equipped with local disks) or global file systems. Let us call that filesystem/scratch
and set:
top_tmpdir=/scratch
- A sub-directory with a unique name can be generated with the
mktemp
command.mktemp
generates a unique name from a template by replacing a sequence ofX
s by a unique value. It prints the unique name such that it can be stored in a variable. For easy identification of your temporary directories you can use your username (which is contained in the variable$USER
) in the template, and set:
my_tmpdir=$(mktemp -d "$top_tmpdir/$USER.XXXXXXXXXXXX")
- The next line in our example handles automatic deletion. Wherever the script exits our temporary directory will be deleted:
trap "rm -rf $my_tmpdir" EXIT
- Now we can work in our temporary directory:
cd $my_tmpdir
...
Tracing command execution
There are two shell settings for tracing command execution. After set -v
all commands are printed as they appear literally in the script. After set -x
commands are printed as they are being executed (i.e. with variables expanded). Both settings are also useful for debugging.
Error handling
There are two shell settings that can help to handle errors.
The first setting is set -e
which makes the script exit immediately if a command exits with an error (non-zero) status. The second setting is set -u
which makes the script exit if an undefined variable is accessed (this is also useful for debugging).
Exceptions can be handled in the following ways. If -e
is set an error status can be ignored by using the or operator ||
and calling the true
command which is a no-op command that always succeeds:
command_that_could_go_wrong || true
If -u
is set a null value can be used if variable that is unset is accessed (the braces and the dash after _set
do the job):
if [[ ${variable_that_might_not_be_set-} = test_value ]]
then
...
fi
Trivial parallelization
In this context trivial parallelization means to start more than one executable. For example, two graphics cards can be used in the following way:
# start 2 CUDA binaries (in the background)
CUDA_VISIBLE_DEVICES=0 cudaBinary1 input1 &
CUDA_VISIBLE_DEVICES=1 cudaBinary2 input2 &
# wait for completion of both (all) background jobs
# (do not exit right away)
wait
A more powerful way for starting many tasks or processing a task queue is GNU Parallel.