HTCondor Quick Guide
Limitation
The recommended maximum CPUs allocated to a user is 500. One can request more than 500 CPUs when cluster is idle, however, when other users start running their jobs, free CPU can drop below 100, cluster will start kill jobs until the free CPU rise above 100, or no more users occupy more than 500 CPUs.
Scenario A
It’s early in the morning, the cluster was mostly idle. Bob requested 900 CPUs and got all 900 CPUs he asked, the cluster had 200 free CPU left. Later Alice requested 150 CPUs and she also got all 150 CPUs. Now the cluster saw it had only 50 CPUs left, it then killed some of Bob’s jobs so that free CPUs back to 100.
Scenario B
Continue from Scenario A, more users started work and more jobs were submitted to the cluster. Cluster kept killing jobs to make room for these new users. Poor Bob found his jobs kept disappearing, until his job used total 500 CPUs. Bob’s remaining jobs will continue, however, they can still be killed if running for too long. (see Job Preemption)
Memory Limits
The cluster does not have large memory nodes. The memory per core is as follows:
bl-3 nodes: 6.4 GB RAM per core
bl-4 nodes: 5.3 GB RAM per core
Memory usage is controlled via Linux cgroups, with a maximum limit of 5 GB per CPU on non-GPU nodes. For instance, if a job requests 6 CPUs (request_cpus = 6), it can utilize up to 6 x 5 GB = 30 GB of RAM.
Most jobs typically use a small amount of RAM. Therefore, we’ve made an exception: if a job requests 3 CPUs or less, it still has an upper limit of 20 GB of RAM. We hope it allows jobs that occasionally use relative large memory to finish.
The above memory limit rules do not apply to GPU nodes and bl-hd nodes.
Quick Guide
This article will show you how to submit a computational job to the HTCondor system.
First of all, you should have your program prepared. For example, it is called exampleA. The program will accept a number as the command line argument, which means the number of events to be simulated. If you execute the program in the node directly, you may type:
$ exampleA 10
to generate 10 simulation events.
Then you need an job file for more simulation. It looks like:
Universe = vanilla
Executable = exampleA
Arguments = 1000
Log = exampleA.log
Output = exampleA.out
Error = exampleA.error
# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
Queue
Save it and choose a name for it. For example, job_file_1.
Let’s examine each of these lines:
Universe
: The vanilla universe means a plain old job. Later on, we’ll
encounter some special universes.
Executable
: The name of your program
Arguments
: These are the arguments you want. They will be the same
arguments we typed above.
Log
: This is the name of a file where Condor will record information
about your job’s execution. While it’s not required, it is a really
good idea to have a log.
Output
: Where Condor should put the standard output from your job.
Error
: Where Condor should put the standard error from your job. Our job
isn’t likely to have any, but we’ll put it there to be safe.
should_transfer_files
: INPAC cluster has shared filesystem. It shall set to
NO. Occasionally we see users copy condor job file from other clusters and
set should_transfer_files to YES. Do NOT do that. It has significant impact
on cluster performance.
Next, tell Condor to run your job:
$ condor_submit job_file_1
Then it’s OK. You can use command condor_q
to query the status of
your job.
When the job is finished, you can find several files: exampleA.log, exampleA.error and exampleA.out. These files contain the output of the program.
You can also write a script to invoke your program, then set the script
as the Executable
parameter in the job file.
For more information, please read the HT Condor Documents for help.
Multi-threading Job
If the job’s executable is multi-threaded, then request_cpus must be used in the job file, otherwise the job will be limited to run on one CPU core only, no matter how many threads it spawning. The job file may looks like:
executable = my_multi_threading_program
arguments = arg1 arg2 arg3
log = job.log
output = job.out
error = job.err
request_cpus = 8 # requests 8 CPU cores for the job
# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
queue 1
Each job can request at most 18 cores. If the jobs stay in idle status as
reported by condor_q
, then the cluster may not able to match the job to a
slot that has enough cores as requested. Run condor_q -analyze
may reveal
some clues.
Job Preemption
Job preemption let users get their fair share of cluster resources. Without it a misbehaved job could run forever.
There are three types of computing nodes in term of how long a job is allowed to run without been forcefully interrupted:
Job can run up to 24 hours without been preempted. This is the default type of node a job will be assigned to.
Job can run up to 2 hours. If all 24-hours nodes are occupied, new jobs will go to this type of nodes.
Job can run up to 60 days. Jobs will go to the 60-days nodes when all 24-hours and 2-hours nodes are taken.
User can also request job to run on 60-days nodes. Add a requirements line to the job file:
# request nodes that allow job run uninterruptly for more than 24 hours
Requirements = ( MaxJobRetirementTime > 86400 )
Don’t getting too excited to make this configure your “default”. It may put jobs in a very long waiting list, if all 60-days nodes are occupied.
Singularity
An example condor job file that mounts /lustre/pandax/csmc
into singularity
container and lists directory:
Executable = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity
Universe = vanilla
getenv = True
Output = log/sgrun.out
Error = log/sgrun.err
Log = log/sgrun.log
# --bind binds the cluster filesystem into singularity
# centos6.sif is a singularity image pulled from sylabs.io using command:
# singularity build centos6.sif library://library/default/centos:6
Arguments = "exec --bind /lustre/pandax/csmc:/mnt \
/home/csmc/singularity/centos6.sif \
sh -c 'cat /etc/redhat-release && ls -F /mnt'"
# or run an unpacked centos 7 container from ATLAS. Bind mount both cvmfs
# and user directory on lustre
# Arguments = "exec --bind /cvmfs:/cvmfs \
# --bind /lustre/pandax/csmc:/mnt \
# /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7-base \
# sh -c 'cat /etc/redhat-release \
# && ls -F /mnt \
# && ls -F /cvmfs/sft.cern.ch/lcg'"
# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
Queue
Working Example for People without Patience
User can try to look at one of Prof. Yang Haijun’s example to run condor jobs at folder on bl-1-1.physics.sjtu.edu.cn:
$ cd /home/yhj/condor-example/
$ ./makejob # script to make powhegbox MC generator jobs with different parameters, one job per folder
$ ./makejob-condor # script to make condor jobs and combine them into one file 'Mytest-data.condor', then submit all condor jobs
The powhegbox MC generator job is pwhg_main
, the condor job file is
powhegZZ.sh
.