Setup the simulation as standard. There are no differences in the pre-processing steps used for a cluster simulation as there are for the simulations that would be run locally. The only anomaly, however, is file paths. If the simulation is being run transiently, and there are output files generated at specific timesteps, ensure that the destination directory is set using the UNIX directory format.
Specifying a Max Time
When running on a cluster, most clusters will have a time limit that jobs must adhere to. That is to say that simulations cannot run indefinitely, they will be stopped after a certain amount of time elapses. If a simulation is forcefully stopped, all data will be lost, so it is a best practice to set a maximum time for CFX to run for, rather than let the scheduler (the program that actually controls jobs) kill the simulation.
If you are not satisfied with the convergence of the stopped solution, you can run the simulation using the results as initial conditions to get better convergence. But if the solver is killed, there will be no results file to initialize from. To set the Time Control, open up the Setup (CFX-Pre) Module, and in the left pane open Simulation > Flow Analysis 1 > Solver > Solver Control. Then enable the “Elapsed Wall Clock Time Control” and set the maximum time to any time under the maximum allowable time.
Export the Solver Definition File
The next step is to export the Solver Definition File. This file contains all the information necessary to perform the simulation including the mesh, which models to use, boundary conditions etc.
To export the file, right click on the Simulation Control entity in the tree and select “Write Solver Input File”. Then in the dialog presented, choose a location to save the file to.
Details on how to transfer files to the cluster over SFTP were previously given in the Getting Started Tutorial. For the simulation to run correctly, the file must be placed in the /short directory, preferably in the subdirectory with your username, e.g. /short/stuart
PBS stands for Portable Batch System. It is a program installed on bc247 to control how, when and what programs users can call. bc247 has many other software packages installed and is used for lots of different research projects. Users can submit their individual “jobs” into a “queue”, and PBS allocates the resources to different computers. In order to run your job, you must have a PBS script that specifies what you need for your job, how long it will take, and what you want to run. All this information is written in the PBS script.
A PBS script is just a text file with a list of instructions. The text file can be created on a local machine and uploaded using SFTP, or it can be created on the cluster using a linux text editor such as vim or nano. Further details on how to use these text editors can be found from a quick Google search. Either way, the PBS script must somehow exist on the cluster.
This is an example of a PBS script for running CFX. The first line specifies that the script is to be run in the bash shell, one of the UNIX command languages. The next line has the –q flag which specifies the queue to place the job in. The following lines with the -l flags specify the resources that are to be used for the job; firstly the walltime value of 06:00:00 specifies that the job will last for six hours maximum. This should be set to slightly more than the maximum run time specified in the CFX preprocessor. The next resource specification is the nodes that are to be used. The synatax is the number of nodes, in this case 10, followed by the type of nodes used, N14, and finally the processors per node, 8. The nodes on bc247 are labelled by the computer lab they are in. The different labs are N11, N12, N14, N15, N16, N18, N19, N20 & N21, however not all labs are available at all times, so you will need to know which lab to use. All the compute nodes have 8 core processors, so it is recommended to leave the processors per node (ppn) as 8. Using the below nodes specification, thee will be 8 processors running on each of 10 nodes, making a total of 80 processors used in this job. The following lines of “-e” & “-o” are the locations of the error file and output file respectively. The -V directive means that environmental variables will all be replicated on the compute nodes. Finally the -wd directive tells the scheduler to start the job in the working directory. Using this means that the path to the CFX definition file does not have to be specified in absolute syntax.
#PBS -q normal
#PBS -l walltime=06:00:00
#PBS -l nodes=10:N14:ppn=8
#PBS -e stderr3.txt
#PBS -o stdout3.txt
nodes=`echo $nodes | sed -e ‘s/ /,/g’`
/apps/ansys_inc/v150/cfx/bin/cfx5solve -def “CFX.def” -double -par-dist $nodes -start-method “HP MPI Distributed Parallel”
The final line is where the simulation is actually run. The actual CFX solver program is installed in “/apps/ansys_inc/v150/cfx/bin/cfx5solve”, and then following on is a list of all the options. Firstly, the Solver definition file is specified using the option -def followed by the Solver Definition Filename (in this case CFX.def). Next, -double tells the solver to solve in double precision mode, -par-dist indicates a parallel distributed run, $nodes is a list of nodes that are assigned to the articular job. Except for the filename for the Solver Definition File, none of these options should need to be changed.
Running the Simulation
Once the PBS Script has been created, ensure that both the script and the Solver Definition file are in the same directory, and that you have a bash shell open in that same directory. In the bash shell, type in the following command to submit the job to a queue:
where pbsjob.sh is the filename of the PBS Script created in the above script. The job will now be in the queue, and the job will start if there are enough available nodes for the job.
To check the status of a job, give the following command:
You will be presented with a list of all the current jobs on the system with a letter indicating the jobs status:
Q Job is queued waiting for available resources.
R Job is running.
C There has been an error in the job and it is being cancelled
E Job has finished (successfully or unsuccessfully)