Skip to content

Le cluster ensicompute

Le cluster est composé de 13 serveurs Dell R740.

Configuration matérielle

La configuration des 13 serveurs est identique. Seuls diffèrent le nombre et le modèle de GPUs qu'ils embarquent.

Noeuds Modèle #CPUs x #CoresPerCPU x #ThreadsPerCore RAM
#GPUs
Modèle
VRAM
tesla Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz 2 x 10 x 2 = 40 128Gb 1 Tesla V100 32Gb
turing-1 ... turing-11 Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz 2 x 10 x 2 = 40 128Gb 3 Quadro RTX 6000 24Gb
ampere Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz 2 x 10 x 2 = 40 128Gb 3 A40 24Gb

Focus sur les GPUs

La plateforme dispose d'une carte NVIDIA V100, de 3 cartes NVIDIA A40 et de 31 cartes NVIDIA RTX 6000. Les informations suivantes donnent des détails sur leurs caractéristiques et la façon dont ils sont connectés au niveau hardware (affinité GPU / cores).

NVIDIA RTX 6000

Les compute nodes turing-[1..11] sont équipés de 3 GPUs NVIDIA RTX 6000.

root@turing-11:~# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 6000                Off |   00000000:3B:00.0 Off |                    0 |
| N/A   30C    P8             13W /  250W |       0MiB /  23040MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 6000                Off |   00000000:AF:00.0 Off |                    0 |
| N/A   30C    P8             13W /  250W |       0MiB /  23040MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 6000                Off |   00000000:D8:00.0 Off |                    0 |
| N/A   30C    P8             13W /  250W |       0MiB /  23040MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
root@turing-11:~# nvidia-smi topo -m
      GPU0      GPU1    GPU2    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0   X        SYS     SYS     0,2,4,6,8,10    0               N/A
GPU1  SYS        X      SYS     1,3,5,7,9,11    1               N/A
GPU2  SYS       SYS      X      1,3,5,7,9,11    1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)

NVIDIA V100

Le compute node tesla est équipé d'une carte NVIDIA V100.

root@tesla:~# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-PCIE-32GB           Off |   00000000:3B:00.0 Off |                    0 |
| N/A   34C    P0             25W /  250W |       0MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@tesla:~# nvidia-smi topo -m
      GPU0      CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0   X        0,2,4,6,8,10    0               N/A

Legend:

  X    = Self

NVIDIA A40

Le compute node ampere est équipé de 3 cartes NVIDIA A40.

root@ampere:~# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                     Off |   00000000:3B:00.0 Off |                    0 |
|  0%   31C    P8             22W /  300W |       0MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A40                     Off |   00000000:AF:00.0 Off |                    0 |
|  0%   32C    P8             21W /  300W |       0MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A40                     Off |   00000000:D8:00.0 Off |                    0 |
|  0%   31C    P8             12W /  300W |       0MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
root@ampere:~# nvidia-smi topo -m
      GPU0      GPU1    GPU2    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0   X        SYS     SYS     0,2,4,6,8,10    0               N/A
GPU1  SYS        X      SYS     1,3,5,7,9,11    1               N/A
GPU2  SYS       SYS      X      1,3,5,7,9,11    1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)