Dept of Radiology Computing Infrastructure

Server Info

11/07/2024
HTCondor updated to 24.0.1. For details, see https://htcondor.org/htcondor/release-highlights/#long-term-support-channel

10/27/2024
Virtual infrastructure was migrated from VMware to Proxmox

10/10/2024
Additional 139TB SSD RAID now available (radraid2)

02/29/2024
Harbor upgraded to version 2.9.1

04/16/2021
139TB SSD RAID now available (radraid)

1/16/2021
Two lambda RTX 8000 servers are now available

11/26/2019
Local docker registry added https://registry.cvib.ucla.edu/v2/_catalog

Check current and average GPU usage (internal only)

Running GPU example codes with htcondor

  •  ssh to a GPU server
  •  Run the command git clone https://github.com/CHTC/templates-GPUs.git
  •   cd templates-GPUs/test
  •  Run the “submit.sub” job with the command condor_submit submit.sub
  •  Monitor job status with  condor_q
  •  Review the job output files when completed

Links

Hostname GPU-Device GPUs GPUmem CPUs Mem
REDLRADADM23589 Tesla V100-SXM2-32GB 8 32 40 512
REDLRADADM14958 Quadro RTX 8000 10 48 48 1024
REDLRADADM14959 Quadro RTX 8000 10 48 48 1024
REDLRADADM23620 GeForce RTX 2080 Ti 4 12 10 128
REDLRADADM23621 GeForce RTX 2080 Ti 4 12 10 128
REDLRADADM23710 GeForce RTX 2080 Ti 4 12 10 128
REDLRADADM23712 GeForce GTX 980 3 4 16 128
REDWRADMMC23199 GeForce GTX 1080 Ti 4 12 6 96
redlradbei05920 GeForce RTX 2080 Ti 4 12 10 128
  • Connecting: ssh REDLRADADM23589.ad.medctr.ucla.edu
  • Using the NGC Catalog
    • Choose container by clicking on the down arrow in any of the panels. This copies it to your clipboard
    • ssh to the DGX-1 and paste the command you obtained from the catalog to pull
  • Running with htcondor
    • See the “Getting Started” tab for examples and modify the submit file.
      • For example:
        • For templates – GPUs/docker/tensorflow_python/test_tensorflow.sub
        • Change: Requirements = (Target.CUDADriverVersion >= 10.1)
        • To: Requirements = ((Target.CUDADriverVersion >= 10.1) && (Machine == “REDLRADADM23589.ad.medctr.ucla.edu”))
  • Users can store up to 200GB in their /raid/username directory
  • There are currently no GPU usage policies enforced, so users are asked not to monopolize GPUs interactively. Over time we’ll put policies in place as users condorize their applications for batch execution

Check current Disk usage (internal only) GPU servers have the following filesystems available for use:

Hostname Share Size (TB) Backup
REDLRADADM14901 radraid 147 No
REDLRADADM23713 radraid2 147 No
REDLRADADM23589 raid 7 No
REDLRADADM14958 raid 35 No
REDLRADADM14959 raid 35 No
REDLRADADM18059 scratch 22 No
REDLRADADM18059 data 51 No
REDLRADADM21129 trials 22 Yes
REDLRADADM21129 data 30 No
REDLRADADM30333 scratch 32 No
REDLRADADM30333 images 22 No
REDLRADADM30333 cvib4 10 Yes
REDLRADADM05294 ciisraid 70 Yes
REDLRADADM23716 cvibraid 70 Yes

VM Info (internal only)

Server Cores Mem (GB) Disk (TB) VMs
REDLRADADM34545 24 128 1.63 11
REDLRADADM34546 32 512 2.72 24
REDLRADADM34547 40 512 3.27 33
REDLRADADM34548 24 512 2.54 27
REDLRADADM34549 40 512 6.11 25