Job Partitions/Queues

Quick Links

support@oscer.ou.edu

Partitions/Queues for Sooner

When you submit a job to Sooner, you must specify a partition (also called queue) for that job to run in. The partition tells Sooner which compute nodes to use to ensure optimal job execution. Sooner has several public partitions that can be used depending on the type of job you plan on running.

Partition Name	Job Execution Time Limit	Description
sooner_test	48 hours	This is the default partition/queue, which includes all compute nodes from the other sooner_test_XXgb_XXcore partitions. This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring up to 64 CPU cores and ~250 GB RAM per node. Jobs in this partition/queue should be limited to 256 CPU cores (13 20-core compute nodes or 11 24-core compute nodes or 4 64-core compute nodes).
sooner_gpu_test	48 hours	This is the default partition/queue, which includes all compute nodes with a GPU from the other sooner_gpu_test_XXX partitions. This queue is heterogeneous, including nodes with NVIDIA L40S, H100, and RTX 6000 Ada GPUs. This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring up to 64 CPU cores and ~250 GB RAM per node.
sooner_test_32gb_20core	48 hours	This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring up to 20 CPU cores and ~30 GB RAM per node. Jobs in this partition/queue should be limited to 240 CPU cores (up to 12 compute nodes). Compute nodes in this partition are equipped with dual Intel Xeon Haswell 10-core CPUs (20 cores per node) and 32 GB RAM.
sooner_test_64gb_24core	48 hours	This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring 21-24 CPU cores and/or ~31-60 GB RAM per node. Jobs in this partition/queue should be limited to 240 CPU cores (10 compute nodes). Compute nodes in this partition are equipped with dual Intel Xeon Haswell 12-core CPUs (24 cores per node) and 64 GB RAM.
sooner_test_128gb_64core	48 hours	This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring 25-64 CPU cores and/or ~61-125 GB RAM per node.. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes). Compute nodes in this partition are equipped with dual Intel Xeon Ice Lake 32-core CPUs (64 cores per node) or dual AMD Rome 32-core CPUs (64 cores per node), each with 128 GB RAM.
sooner_test_256gb_64core	48 hours	This partition/queue supports both multi-node (e.g. MPI) and single-node jobs requiring 25-64 CPU cores and/or ~126-250 GB RAM per node.. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes). Compute nodes in this partition are equipped with dual Intel Xeon Ice Lake 32-core CPUs (64 cores per node) or dual Intel Sapphire Rapids 32-core CPUs (64 cores per node), each with 256 GB RAM.
sooner_test_largemem	48 hours	This partition/queue is for jobs that each fit in a single (but big) compute node and that needs up to 128 CPU cores and/or ~500 to ~4000 GB RAM. This partition/queue has Sooner's only general-use large RAM node, which has quad Intel Sapphire Raid 32-core CPUs (128 cores total) and 4096 GB (4 TB) RAM.
sooner_test_longjobs	7 days	This partition/queue is for jobs that need to run for 2 to 7 days. This partition/queue has 15 general-use compute nodes: * 38 general-use compute nodes that each have dual Intel XeonHaswell 10-core CPUs (20 cores total) and 32 GB RAM * 9 general-use compute nodes that each have dual Intel Xeon Haswell 10-core CPUs (20 cores total) and 32 GB RAM PLUS * 6 general-use compute nodes that each have dual Intel Xeon Haswell 12-core CPUs (24 cores total) and 64 GB RAM. The nodes in this partition/queue are a subset of the nodes in the sooner_test partition/queue and other general-use partitions/queues.
sooner_gpu_test_h100 sooner_gpu_test_h100_dual sooner_gpu_test_h100_quad	48 hours	These partitions support both multi-node (e.g. MPI) and single-node GPU jobs requiring up to 4 H100 80GB GPUs per node, along with up to 64 CPU cores and ~250 GB RAM per node. GPUs must be requested using the following SLURM directive: --gres=gpu:<N>. These partitions/queues are equipped with NVIDIA H100 80GB GPUs and dual Intel Sapphire Rapids 32-core CPUs (64 cores per node) with either 512 GB of RAM for dual-GPU nodes or 1024GB of RAM for quad-GPU nodes. The “dual” queues include nodes with at least two GPUs per node, while the “quad” queues include nodes with four GPUs per node.
sooner_gpu_test_ada	48 hours	These partitions support both multi-node (e.g. MPI) and single-node GPU jobs requiring up to 2 Ada 50GB GPUs per node, along with up to 64 CPU cores and ~250 GB RAM per node. GPUs must be requested using the following SLURM directive: --gres=gpu:<N>. These partitions/queues are equipped with NVIDIA RTX 6000 Ada 50GB GPUs and dual Intel Sapphire Rapids 32-core CPUs (64 cores per node) with 256 GB of RAM.
sooner_gpu_test_dual sooner_gpu_test_quad	48 hours	These partitions/queues support both multi-node (e.g., MPI) and single-node GPU jobs. They are heterogeneous, including nodes with NVIDIA L40S, H100, and RTX 6000 Ada GPUs. GPUs must be requested using the SLURM directive: --gres=gpu:<N>. The partitions are built on nodes equipped with either dual Intel Sapphire Rapids 32-core CPUs (64 cores per node) or dual Intel Ice Lake 32-core CPUs (64 cores per node), with memory ranging from 256 GB to 1000 GB per node. “Dual” queues consist of nodes with at least two GPUs per node, while “quad” queues consist of nodes with four GPUs per node.

Job Partitions/Queues on Schooner

Partition Name	Job Execution Time Limit	Description
32gb_20core	48 hours	This partition/queue is for jobs that use multiple compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and only need up to 20 CPU cores in each node and up to ~30 GB RAM in each node. Jobs in this partition/queue should be limited to 240 CPU cores (12 compute nodes). This partition/queue has general-use compute nodes that each have dual Intel Xeon Haswell 10-core CPUs (20 cores per node) and 32 GB RAM.
64gb_24core	48 hours	This partition/queue is for jobs that use multiple compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and need 21 to 24 CPU cores in each node and/or ~31 to ~60 GB RAM in each node. Jobs in this partition/queue should be limited to 240 CPU cores (10 compute nodes). This partition/queue has general-use compute nodes that each have dual Intel Xeon Haswell 12-core CPUs (24 cores per node) and 64 GB RAM.
128gb_64core	48 hours	This partition/queue is for jobs that use multiple compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and need 25 to 64 CPU cores in each node and/or ~61 to ~125 GB RAM in each node. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes). This partition/queue has general-use compute nodes that each have dual Intel Icelake 32-core CPUs (64 cores per node) or dual AMD Rome 32-core CPUs (64 cores per node) and 128 GB RAM.
256gb_64core	48 hours	This partition/queue is for jobs that use multiple compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and need 25 to 64 CPU cores in each node and/or ~126 to ~250 GB RAM in each node. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes). This partition/queue has general-use compute nodes that each have dual Intel Icelake 32-core CPUs (64 cores per node) or dual AMD Rome 32-core CPUs (64 cores per node) and 256 GB RAM.
normal OR normal_avx2	48 hours	This partition/queue is for jobs that use multiple compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and only need up to 64 CPU cores in each node and up to ~250 GB in each node. Jobs in this partition/queue should be limited to 256 CPU cores (13 20-core compute nodes or 11 24-core compute nodes or 4 64-core compute nodes). This is the default partition/queue, combining all of the compute nodes in all of the above partitions.
normal_avx512	48 hours	This partition/queue is for running executables that have AVX-512 instructions, which work on the following CPU families: Intel: Skylake, Cascade Lake, Ice Lake, Sapphire Rapids; AMD: Genoa. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes). This partition/queue has general-use compute nodes that each have dual Intel Ice Lake or Sapphire Rapids 32-core CPUs (64 cores per node) and 128 - 256 GB of RAM.
normal_well OR normal_fdr10	48 hours	This partition/queue is the combination of 32gb_20core and 64gb_24core partitions. Jobs in this partition/queue should be limited to 240 CPU cores. This partition/queue has general-use compute nodes that each have dual Intel Xeon Haswell CPUs (20-24 cores per node) and 32-64GB RAM. Nodes in this partition are connected to FDR10 InfiniBand network switches, at 40 Gigabits per second.
normal_hdr100	48 hours	This partition/queue is the combination of 128gb_64core and 256gb_64core partitions. Jobs in this partition/queue should be limited to 256 CPU cores (4 compute nodes) This partition/queue has general-use compute nodes that each have dual Intel Ice Lake or Sapphire Rapids 32-core CPUs or dual AMD Rome or Milan 32-core CPUs (64 cores per node) and 128 to 256 GB of RAM. Nodes in this partition are connected to HDR100 InfiniBand network switches, at 100 Gigabits per second.
large_mem OR largemem	48 hours	This partition/queue is for jobs that each fit in a single (but big) compute node and that need 25 to 32 CPU cores and/or ~500 to ~2000 GB RAM. This partition/queue has Schooner's only general-use large RAM node, which has quad Intel Xeon Haswell 8-core CPUs (32 cores total) and 2048 GB (2 TB) RAM.
longjobs	7 days	This partition/queue is for jobs that need to run for 2 to 7 days, specifically: * jobs that use up to 41 compute nodes (e.g., MPI parallel jobs), or jobs that each fit in a single compute node, and that need up to 20 CPU cores in each node and up to ~30 GB in each node, OR * jobs that use up to 3 compute nodes (e.g., MPI parallel jobs), or jobs that fit in a single compute node, and that need 21 to 24 CPU cores in each node and/or ~41 to ~60 GB in each node. This partition/queue has 41 general-use compute nodes: * 38 general-use compute nodes that each have dual Intel Xeon Haswell 10-core CPUs (20 cores total) and 32 GB RAM PLUS * 3 general-use compute nodes that each have dual Intel Xeon Haswell 12-core CPUs (24 cores total) and 64 GB RAM. The nodes in this partition/queue are a subset of the nodes in the normal partition/queue and other general-use partitions/queues.
longlargemem OR largelongmem	7 days	Same as largemem, EXCEPT jobs can run for up to 7 days.
largejobs	48 hours	This partition/queue is for multi-node parallel jobs (e.g., MPI parallel jobs) that need either * 241 to 6000 CPU cores/13 to 300 compute nodes, in nodes that each use up to 20 CPU cores and up to ~30 GB RAM, OR * 241 to 840 CPU cores/11 to 35 compute nodes, in nodes that each use 21 to 24 CPU cores and/or ~30 to ~60 GB RAM. NOTE: Jobs of up to 240 CPU cores/12 compute nodes should be submitted to the other general-use queues, NOT to largejobs. This partition/queue has all of the compute nodes found in both the 32gb_20core partition/queue and the 64gb_24core partition/queue.
debug	30 minutes	For testing, debugging and performance benchmarking of your software, for up to 30 minutes per job on up to 3 compute nodes, specifically: * 2 compute nodes of 20 CPU cores and 32 GB each AND * 1 compute node of 24 CPU cores and 64 GB each. Almost always, jobs submitted to the debug queue start within 30 minutes of being submitted, but of course this ISN'T GUARANTEED.
debug_5min	5 minutes	For very quick testing, debugging and performance benchmarking of your software, for up to 5 minutes per job on up to 3 compute nodes, specifically: * 2 compute nodes of 20 CPU cores and 32 GB each Almost always, jobs submitted to the debug_5min queue start within 5 minutes of being submitted, but of course this ISN'T GUARANTEED.

To specify a partition, use the following line of code in your batch script:

#SBATCH --partition=[partition name]

Example: To use the sooner_test partition:

#SBATCH --partition=sooner_test

Please contact us at support@oscer.ou.edu if you have questions about which partition is best for your job.