Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
hlr:phoenix:queueing-system_slurm [2020/11/11 16:25]
tarkhell
hlr:phoenix:queueing-system_slurm [2020/11/12 17:31]
tarkhell [Basics of SLURM]
Zeile 1: Zeile 1:
-====== Queuing-System (SLURM) ======+====== Queuing-System (SLURM) ​and Jobfiles ​======
  
 **Target audience: Beginners in Environmental Modules and the Queuing-System SLURM** **Target audience: Beginners in Environmental Modules and the Queuing-System SLURM**
Zeile 12: Zeile 12:
   * You will learn to how to use Environment Modules, a widely used system for handling different software environments (basic level)   * You will learn to how to use Environment Modules, a widely used system for handling different software environments (basic level)
   * You will learn to use the workload manager SLURM to allocate HPC resources (e.g. CPUs) and to submit a batch job (basic level)   * You will learn to use the workload manager SLURM to allocate HPC resources (e.g. CPUs) and to submit a batch job (basic level)
 +  * You will learn how a simple jobscript looks like and how to submit it (basic to intermediate level)
  
 ===== General Information ===== ===== General Information =====
Zeile 59: Zeile 59:
 |help on module itself ​    | ''​module help'' ​    | |help on module itself ​    | ''​module help'' ​    |
  
-===== Partitionen (Queues) ​=====+===== Basics of SLURM =====
  
 __//​Introduction://​__ __//​Introduction://​__
Zeile 109: Zeile 109:
 ^ Specification ^ Option ^ Comments ^  ^ Specification ^ Option ^ Comments ^ 
 | Number of nodes requested | --nodes=//​N//​ |  |  | Number of nodes requested | --nodes=//​N//​ |  | 
-| Number of tasks to invoke on each node | --tasks-per-node=//​n//​ | Can be used to specify the number of cores to use per node, e.g. to avoid {{https://​en.wikipedia.org/​wiki/​Hyper-threading|hyper-threading]]. (If option is omitted, all cores and hyperthreads are used; Hint: using hyperthreads is not always advantageous.) | +| Number of tasks to invoke on each node | --tasks-per-node=//​n//​ | Can be used to specify the number of cores to use per node, e.g. to avoid [[https://​en.wikipedia.org/​wiki/​Hyper-threading|hyper-threading]]. (If option is omitted, all cores and hyperthreads are used; Hint: using hyperthreads is not always advantageous.) | 
 | Partition | --partition= //​partitionname//​ | SLURM supports multiple partitions instead of several queues | Partition | --partition= //​partitionname//​ | SLURM supports multiple partitions instead of several queues
 | Job time limit | --time=//​time-limit//​ | time-limit may be given as minutes or in hh:mm:ss or d-hh:mm:ss format (d means number of days) |  | Job time limit | --time=//​time-limit//​ | time-limit may be given as minutes or in hh:mm:ss or d-hh:mm:ss format (d means number of days) | 
Zeile 116: Zeile 116:
 For the ''​sbatch''​ command these options may also be specified directly in the job script using a pseudo comment directive starting with ''#​SBATCH''​ as a prefix. The directives must precede any executable command in the batch script: For the ''​sbatch''​ command these options may also be specified directly in the job script using a pseudo comment directive starting with ''#​SBATCH''​ as a prefix. The directives must precede any executable command in the batch script:
  
-        ​''​#!/bin/bash''​ +        #​!/​bin/​bash 
-        ​''​#SBATCH --partition=std''​ +        #SBATCH --partition=std 
-        ​''​#SBATCH --nodes=2''​ +        #SBATCH --nodes=2 
-        ​''​#SBATCH --tasks-per-node=16''​ +        #SBATCH --tasks-per-node=16 
-        ​''​#SBATCH --time=00:​10:​00''​ +        #SBATCH --time=00:​10:​00 
-        ​''​...''​ +        ... 
-        ​''​srun ./​helloParallelWorld''​+        srun ./​helloParallelWorld
         ​         ​
 A complete list of parameters can be retrieved from the ''​man''​ pages for ''​sbatch'',​ ''​salloc'',​ or ''​srun'',​ e.g. via A complete list of parameters can be retrieved from the ''​man''​ pages for ''​sbatch'',​ ''​salloc'',​ or ''​srun'',​ e.g. via
  
-        ​''​man sbatch''​+        man sbatch
  
 __//​Monitoring job and system information://​__ __//​Monitoring job and system information://​__
Zeile 163: Zeile 163:
 | View detailed job information. | ''​scontrol''​ show job //jobid// | | View detailed job information. | ''​scontrol''​ show job //jobid// |
  
-Retrieving accounting information+__//Retrieving accounting information://__
  
 +There are two commands for retrieving accounting information:​
 +  * ''​sacct''​
 +    * shows accounting information for jobs and job steps in the SLURM job accounting log or SLURM database. For active jobs the accounting information is accessed via the job accounting log file. For completed jobs it is accessed via the log data saved in the SLURM database. Command line options can be used to filter, sort, and format the output in a variety of ways. Columns for jobid, jobname, partition, account, allocated CPUs, state, and exit code are shown by default for each of the user’s jobs eligible after midnight of the current day.
 +  * ''​sacctmgr''​
 +    * is mainly used by cluster administrators to view or modify the SLURM account information,​ but it also offers users the possibility to get some information about their account. The account information is maintained within the SLURM database. Command line options can be used to filter, sort, and format the output in a variety of ways.
 +    * The Table below lists basic user activities for retrieving accounting information and the corresponding SLURM commands.
  
-User Activities ​for Job and System Monitoring +User Activity ^ SLURM Command ^  
-(user supplied ​information ​is given in italics; +| View job account information ​for a specific job. | ''​sacct''​ -j //jobid// | 
-brackets indicate optional specifications)+| View all job information ​from a specific start date (given as yyyy-mm-dd). | ''​sacct''​ -S //​startdate//​ -u $USER | 
 +| View execution time for (completedjob (formatted as days-hh:​mm:​ss,​ cumulated over job steps, and without any header). | ''​sacct''​ -n -X -P -o Elapsed -j //jobid// | 
 +     
 +===== Jobscripts =====
  
 +__//​Submitting a batch job://__
  
-In Slurm werden die verschiedenen Warteschlangen von Jobs als Partitionen bezeichnetJede Partition hat ein bestimmtes Verhalten, Beispielsweise wie viele Knoten mindestens und/oder maximal genutzt werden können oder mit welcher Priorität freie Knoten zugeteilt werden+Below an example script for a SLURM batch job – in the sense of a hello world program – is givenThe job is suited to be run in the Phoenix HPC cluster at the Gauß-IT-Zentrum. For other cluster systems some appropriate adjustments will probably be necessary.
  
-Auf welche Partitionen ein Nutzer zugreifen kann, wird vom Nutzerrat entschiedenDadurch kann sichergestellt werden, dass bestimmten Nutzer(gruppenpriorisiert werden können bzw. das wichtige Jobs nicht ausgebremst werden.+<​code>​ 
 +#​!/​bin/​bash 
 +   # Do not forget to select a proper partition if the default 
 +   # one is no fit for the job! You can do that either in the sbatch 
 +   # command line or here with the other settings. 
 +#SBATCH --partition=standard 
 +   # Number of nodes used: 
 +#SBATCH --nodes=2 
 +   # Wall clock limit: 
 +#SBATCH --time=12:​00:​00 
 +   # Name of the job: 
 +#SBATCH --job-name=nearest 
 +   # Number of tasks (coresper node:  
 +#SBATCH --ntasks-per-node=20
  
 +   # If needed, set your working environment here.
 +working_dir=~
 +cd $working_dir
  
-===== Jobs =====+   # Load environment modules for your application here. 
 +module load comp/​gcc/​6.3.0 
 +module load mpi/​openmpi/​2.1.0/​gcc
  
-==== Beispieljobscript ====+   # Execute the application. 
 +mpiexec -np 40 ./​test/​mpinearest 
 +</​code>​
  
 +The job script file above can be stored e.g. in ''​$HOME/​hello_world.sh''​ (''​$HOME''​ is mapped to the user’s home directory).
  
-<​code>​ +The job is submitted to SLURM’s batch queue using the default value for partition ​(scontrol show partitions (also see above) can be used to show that information):
-#​!/​bin/​bash +
-# Job name: +
-#SBATCH --job-name=SLinpack # Wall clock limit: +
-#SBATCH --time=1:​00:​00 +
-# Number of tasks (cores): #SBATCH --ntasks=4 +
-#SBATCH --exclusive +
-module add mpi/​intelmpi/​5.1.2.150 module add intel-studio-2016 +
-mkdir ~/​systemtests/​hpl/​$HOSTNAME cd ~/​systemtests/​hpl/​$HOSTNAME +
-cp ${MKLROOT}/​benchmarks/​mp_linpack/​bin_intel/​intel64/​runme_intel64_prv . cp ${MKLROOT}/​benchmarks/​mp_linpack/​bin_intel/​intel64/​xhpl_intel64 . +
-HPLLOG=~/​systemtests/​hpl/​$HOSTNAME/​HPL.log.2015.$(date +%y-%m-%d_%H%M)+
  
-mpirun -genv I_MPI_FABRICS shm -np $MPI_PROC_NUM -ppn $MPI_PER_NODE ​./​runme_intel64_prv "​$@"​ | tee -a $HPLLOG+<​code>​ 
 +[exampleusername@node001 14:48:33]~sbatch ​$HOME/​hello_world.sh 
 +Submitted batch job 123456
 </​code>​ </​code>​
-==== Wie startet man einen Job? ==== 
  
-<​code>​sbatch --job-name=$jobname -N <​num_nodes>​ --ntasks-per-node=<​ppn>​ Jobscript</​code>​ +The start time can be selected via ''​--begin'',​ for example::
- +
-Eine Startzeit kann mit dem Schalter ​--begin ​vorgegeben werden Beispielsweise:+
  
 <​code>​--begin=16:​00 <​code>​--begin=16:​00
Zeile 206: Zeile 226:
 --begin=2010-01-20T12:​34:​00 --begin=2010-01-20T12:​34:​00
 </​code>​ </​code>​
-Weitere Informationen bietet auch //man sbatch// 
-Sämtliche dort verwendeten Parameter können auch im Jobscript selber mit #SBATCH angebene werden. 
  
-==== Wie beendet ​man einen Job? ====+More information can be found via ''​man sbatch''​. All parameters shown there can be included in the jobscript via ''#​SBATCH''​. 
 + 
 +The output of ''​sbatch''​ will contain the jobid, like 123456 in this example. During execution the output of the job is written to a file, named ''​slurm-123456.out''​. 
 +If there had been errors (i.e. any output to the //stderr// stream) a corresponding file named ''​slurm-123456.err''​ would have been created. 
 + 
 +__//​Cancelling a batch job://__
  
 <​code>​scancel <​jobid></​code>​ <​code>​scancel <​jobid></​code>​
  
-Die dazu notwendige ​ID kann mit dem squeue ​Kommando ermittelt werden.+The required ​ID can be viewed via the general command ''​squeue''​ or the user specific command ''​squeue -u $USER''​
  
-Alle Jobs eines Users löschen:+If you want to delete all jobs of a user:
  
 <​code>​scancel –u <​username></​code>​ <​code>​scancel –u <​username></​code>​
  
-==== Wie frage ich den Status eigener Jobs ab? ==== +__//How to change a node status ​(root only)://__
- +
-<​code>​squeue<​/code> +
- +
-==== Wie ändere ich den Knotenstatus?​(nur root) ====+
  
 <​code>​scontrol update nodename=node[005-008] state=drain reason=”RMA”</​code>​ <​code>​scontrol update nodename=node[005-008] state=drain reason=”RMA”</​code>​
  
-Würde die Knoten aus den verfügbaren Knoten ausschließen so dass keine Jobs mehr dorthin submittet werden könnenund man die Knoten zum Testen/​Reparieren benutzen kann.+This command will exclude the node from the list of available nodes. This ensures that no more jobs can be submitted to this nodeallowing it to be used for testing etc.
  
 <​code>​scontrol update nodename=node[005-008] state=idle</​code>​ <​code>​scontrol update nodename=node[005-008] state=idle</​code>​
  
-würde dies zurücksetzenund ist ggf auch notwendig wenn Knotenabstürze dazu führten, dass ein Knoten aus dem Batchsystem ausgeschlossen wurde.+This reverses the previous command and returns the node back to the list of available nodesExecuting this command might also be necessary if a node crash caused a ramovel of a node from the batch system.
  
-==== Interaktive Jobs ==== 
  
-sbatch beispiel.job+===== Interactive jobs (intermediate difficulty):​ =====
  
-Submitted batch job 1256+__//Method one://__ 
 + 
 +Assume you have submitted a job as follows: 
 + 
 +<​code>​sbatch beispiel.job 
 +Submitted batch job 1256</​code>​ 
 + 
 +Let the corresponding jobfile be the following:
  
 <​code>​ <​code>​
Zeile 254: Zeile 279:
 </​code>​ </​code>​
  
-squeue -l zeigt an auf welchen knoten der job läuft+In this case, the command ''​squeue -l''​ will show you which node the job is currently running on. For example:
  
-1256 standard towhee raskrato ​ RUNNING ​      0:04 7-00:​00:​00 ​     1 node282+<​code>​1256 standard towhee raskrato ​ RUNNING ​      0:04 7-00:​00:​00 ​     1 node282</​code>​
  
 +You can then log onto that node via ''​ssh node282''​ and start a new shell via [[https://​linuxize.com/​post/​how-to-use-linux-screen/​|screen]] (please follow the link for further information). The program can then be started in this new shell.
  
-Mit ssh auf node282 einloggen+Once you are done, you can exit the shell via:
  
-Dann mit screen eine Shell aufmachen, die bestehen bleibt, wenn man sich ausloggt.+<​code>​ 
 +strg a d 
 +</​code>​
  
-Das Programm ​in dieser ​shell starten.+  * You can start as many shells as you like. The command ''​screen -r''​ will show a list of all shells (if it is only one, you will instead return to said shell).  
 +  * You can access a shell running ​in the background via ''​screen -r <​shellnummer>''​.  
 +  * You can quit a shell by pressing the key-combination //CTRL+C// and typing in ''​exit''​.
  
-Mit+Another way to use the allocated nodes is via the ''​salloc''​ command (see method two below). 
 + 
 +__//Method two://__ 
 + 
 +Interactive sessions under control of the batch system can be created via ''​salloc''​. ''​salloc''​ differs from ''​sbatch''​ by the fact that resources are initially only reserved (i.e. allocated) without executing a job script. Also, the session is running on the node on which ''​salloc''​ was invoked (but not on a compute node in contrast to submission with ''​sbatch''​). This is often useful during the interactive development of a parallel program. 
 + 
 +A single node is reserved for interactive usage as follows: 
 + 
 +<​code>​[exampleusername@node001 14:48:33]~$ salloc</​code>​ 
 + 
 +When the resources are granted by SLURM, ''​salloc''​ will start a new shell on the (login or head) node where ''​salloc''​ was executed. This interactive session is terminated by exiting the shell or by reaching the time limit. 
 + 
 +An OpenMP program using //N// threads, for example, can be started on the allocated node as follows:
  
 <​code>​ <​code>​
-strg a d+[exampleusername@node001 14:48:33]~$ export OMP_NUM_THREADS=N 
 +[exampleusername@node001 14:48:33]~$ srun my-openmp-binary
 </​code>​ </​code>​
-die Shell verlassen (aber Hintergrund weiter laufen lassen). ​ Soviele weitere Shells mit screen aufmachenwie benötigt werden. Mit screen ​-r die shells anzeigen lassen ​(wenn es nur eine gibt klingt man sich sofort wieder ein). Mit screen -r shellnummer kommt auf eine shell im Hintergrund rein. Um die Shell zu beenden strg c drücken und exit eingeben.+ 
 +To start an interactive parallel MPI program //N// nodes can be allocated as follows: 
 + 
 +<​code>​[exampleusername@node001 14:48:33]~$ salloc --nodes=N</​code>​ 
 + 
 +The MPI Program using //n=32// processesfor example, can be started on the allocated nodes as follows: 
 + 
 +<​code>​[exampleusername@node001 14:48:33]~$ mpirun ​-np 32 my-mpi-binary</​code>​ 
 + 
 +Another way to use the allocated nodes is to use ssh to establish connections to them (see method one above). 
hlr/phoenix/queueing-system_slurm.txt · Zuletzt geändert: 2020/11/12 17:31 von tarkhell
Gau-IT-Zentrum