HTCondor Quick Guide

Limitation

The recommended maximum CPUs allocated to a user is 500. One can request more than 500 CPUs when cluster is idle, however, when other users start running their jobs, free CPU can drop below 100, cluster will start kill jobs until the free CPU rise above 100, or no more users occupy more than 500 CPUs.

Scenario A

It’s early in the morning, the cluster was mostly idle. Bob requested 900 CPUs and got all 900 CPUs he asked, the cluster had 200 free CPU left. Later Alice requested 150 CPUs and she also got all 150 CPUs. Now the cluster saw it had only 50 CPUs left, it then killed some of Bob’s jobs so that free CPUs back to 100.

Scenario B

Continue from Scenario A, more users started work and more jobs were submitted to the cluster. Cluster kept killing jobs to make room for these new users. Poor Bob found his jobs kept disappearing, until his job used total 500 CPUs. Bob’s remaining jobs will continue, however, they can still be killed if running for too long. (see Job Preemption)

Memory Limits

The cluster does not have large memory nodes. The memory per core is as follows:

bl-3 nodes: 6.4 GB RAM per core
bl-4 nodes: 5.3 GB RAM per core

Memory usage is controlled via Linux cgroups, with a maximum limit of 5 GB per CPU on non-GPU nodes. For instance, if a job requests 6 CPUs (request_cpus = 6), it can utilize up to 6 x 5 GB = 30 GB of RAM.

Most jobs typically use a small amount of RAM. Therefore, we’ve made an exception: if a job requests 3 CPUs or less, it still has an upper limit of 20 GB of RAM. We hope it allows jobs that occasionally use relative large memory to finish.

The above memory limit rules do not apply to GPU nodes and bl-hd nodes.

Quick Guide

This article will show you how to submit a computational job to the HTCondor system.

First of all, you should have your program prepared. For example, it is called exampleA. The program will accept a number as the command line argument, which means the number of events to be simulated. If you execute the program in the node directly, you may type:

$ exampleA 10

to generate 10 simulation events.

Then you need an job file for more simulation. It looks like:

Universe   = vanilla
Executable = exampleA
Arguments  = 1000

Log        = exampleA.log
Output     = exampleA.out
Error      = exampleA.error

# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
Queue

Save it and choose a name for it. For example, job_file_1.

Let’s examine each of these lines:

Universe: The vanilla universe means a plain old job. Later on, we’ll encounter some special universes.

Executable: The name of your program

Arguments: These are the arguments you want. They will be the same arguments we typed above.

Log: This is the name of a file where Condor will record information about your job’s execution. While it’s not required, it is a really good idea to have a log.

Output: Where Condor should put the standard output from your job.

Error: Where Condor should put the standard error from your job. Our job isn’t likely to have any, but we’ll put it there to be safe.

should_transfer_files: INPAC cluster has shared filesystem. It shall set to NO. Occasionally we see users copy condor job file from other clusters and set should_transfer_files to YES. Do NOT do that. It has significant impact on cluster performance.

Next, tell Condor to run your job:

$ condor_submit job_file_1

Then it’s OK. You can use command condor_q to query the status of your job.

When the job is finished, you can find several files: exampleA.log, exampleA.error and exampleA.out. These files contain the output of the program.

You can also write a script to invoke your program, then set the script as the Executable parameter in the job file.

For more information, please read the HT Condor Documents for help.

Multi-threading Job

If the job’s executable is multi-threaded, then request_cpus must be used in the job file, otherwise the job will be limited to run on one CPU core only, no matter how many threads it spawning. The job file may looks like:

executable = my_multi_threading_program
arguments = arg1 arg2 arg3
log    = job.log
output = job.out
error  = job.err

request_cpus = 8  # requests 8 CPU cores for the job

# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
queue 1

Each job can request at most 18 cores. If the jobs stay in idle status as reported by condor_q, then the cluster may not able to match the job to a slot that has enough cores as requested. Run condor_q -analyze may reveal some clues.

Job Preemption

Job preemption let users get their fair share of cluster resources. Without it a misbehaved job could run forever.

There are three types of computing nodes in term of how long a job is allowed to run without been forcefully interrupted:

Job can run up to 24 hours without been preempted. This is the default type of node a job will be assigned to.
Job can run up to 2 hours. If all 24-hours nodes are occupied, new jobs will go to this type of nodes.
Job can run up to 60 days. Jobs will go to the 60-days nodes when all 24-hours and 2-hours nodes are taken.

User can also request job to run on 60-days nodes. Add a requirements line to the job file:

# request nodes that allow job run uninterruptly for more than 24 hours
Requirements = ( MaxJobRetirementTime > 86400 )

Don’t getting too excited to make this configure your “default”. It may put jobs in a very long waiting list, if all 60-days nodes are occupied.

Singularity

An example condor job file that mounts /lustre/pandax/csmc into singularity container and lists directory:

Executable = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity
Universe = vanilla
getenv = True

Output = log/sgrun.out
Error  = log/sgrun.err
Log    = log/sgrun.log

# --bind binds the cluster filesystem into singularity
# centos6.sif is a singularity image pulled from sylabs.io using command:
#  singularity build centos6.sif library://library/default/centos:6
Arguments = "exec --bind /lustre/pandax/csmc:/mnt \
    /home/csmc/singularity/centos6.sif  \
    sh -c 'cat /etc/redhat-release && ls -F /mnt'"

# or run an unpacked centos 7 container from ATLAS. Bind mount both cvmfs
# and user directory on lustre
# Arguments = "exec --bind /cvmfs:/cvmfs \
#   --bind /lustre/pandax/csmc:/mnt \
#   /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7-base \
#   sh -c 'cat /etc/redhat-release \
#          && ls -F /mnt \
#          && ls -F /cvmfs/sft.cern.ch/lcg'"

# INPAC has shared filesystem, do NOT transfer file
should_transfer_files = NO
Queue

Working Example for People without Patience

User can try to look at one of Prof. Yang Haijun’s example to run condor jobs at folder on bl-1-1.physics.sjtu.edu.cn:

$ cd /home/yhj/condor-example/

$ ./makejob  # script to make powhegbox MC generator jobs with different parameters, one job per folder

$ ./makejob-condor  # script to make condor jobs and combine them into one file 'Mytest-data.condor', then submit all condor jobs

The powhegbox MC generator job is pwhg_main, the condor job file is powhegZZ.sh.