Using IBM LSF - UMass GHPCC User Wiki

High Performance
Computing (HPC)
Using IBM LSF
Presented by:
Al Ritacco, Mark Komarinski
Research Computing
UMASS Medical School
Information Services, 09/17/2012
Agenda
•
•
•
•
•
•
Introduction to LSF
Resource management
Using the right queue(s)
Shell scripting and LSF
Tips and tricks
Q&A
2
Information Services, 00/00/2010
Intro to LSF
Information Services, 00/00/2010
HPC Job submissions with LSF
• How do we efficiently create and submit jobs?
• With LSF we can take advantage of a
scheduling engine to distribute jobs in the most
optimal and systematic method
• Submit 1000’s of jobs without worry about
where the jobs will/may run or when.
• You provide LSF with job requirements, and LSF
figures out where to put your jobs
4
Information Services,
5/14/2014
Submitting a job using LSF
• How do we submit jobs to LSF?
– We use the bsub command (batch submit) to
request our job run on the cluster.
• Job results returned via e-mail when complete
• $PATH is important when submitting jobs
• Job submissions must include memory and
time limits.
• If time or memory limits are exceeded, job will
be terminated
5
Information Services,
5/14/2014
HPC Job submissions with LSF
• The following are required to run a job at MGHPCC
– How long your job requires to run, in H:M
• A default of one hour will be set if no time is requested
– How much memory your job requires
• 1GB of RAM will be default per slot
– Defaults will be provided if you do not give them
• 1 hour will be default time for a job
• 1 core will be made a available for your job
• You probably won’t like these defaults
6
Information Services, 00/00/2010
Where does the output go?
Output file options
• Example LSF options:
• -o $HOME/LSF_jobs_output/LSF_job.$LSB_JOBID.out
– Send STDOUT to this file rather than send via e-mail
• -e $HOME/LSF_jobs_output/LSF_job.$LSB_JOBID.err
– Send STDERR to this file rather than send via e-mail
7
Information Services,
5/14/2014
Submitting jobs via BASH
---- job.sh --#!/bin/bash
echo "Test”
sleep 30
$ bsub < job.sh
Job does not list memory required, please specify memory needed per slot (per cpu requested) in
megabytes with "-R rusage[mem=####]"
Setting memory required to default of 1 gigabyte per slot: -R rusage[mem=1024]
Job runtime not indicated, please specify job runtime in minutes with "-W ##"
Setting wallclock time to default of 60 minutes: -W 60
Job <174223> is submitted to default queue <long>.
8
Information Services, 00/00/2010
Checking the status of job(s)
• BJOBS – Shows your running jobs
$ bjobs
JOBID USER STAT QUEUE
174224 awr
FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
PEND long
ghpcc06
*;sleep 30 May 13 11:14
$ bjobs
JOBID USER STAT QUEUE
174224 awr RUN long
FROM_HOST EXEC_HOST JOB_NAME
ghpcc06
c06b01
*;sleep 30
SUBMIT_TIME
May 13 11:14
$ bjobs -u all -r # Shows ALL users running jobs
$ bjobs -a # Show ALL jobs (including completed)
JOBID USER STAT QUEUE
174224 awr DONE long
FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
ghpcc06 c06b01 *;sleep 30 May 13 11:14
9
Information Services, 00/00/2010
Killing a running job
• BKILL – Stops a job (or send a specific signal)
$ bkill {jobid} – stop a specific jobid
$ bkill 0 # stops all your jobs
$ bkill -q long 0 # stop all your jobs in long
queue
10
Information Services, 00/00/2010
Peeking at a running job
• BPEEK – view output (stderr/out) running job
• Note: bpeek does not fflush()
$ bpeek 4587
Output
11
Information Services, 00/00/2010
Suspending Jobs
• BSTOP – put a job in suspend state
$ bstop 174224 # put job on hold
JOBID USER
174224 awr
STAT QUEUE
USUP long
FROM_HOST EXEC_HOST JOB_NAME
ghpcc06
c06b01
*;sleep 30
12
SUBMIT_TIME
May 13 11:14
Information Services, 00/00/2010
Resume a suspended Job
• BRESUME JOBID – Resume job
$ bresume 174224
JOBID USER
174224 awr
STAT QUEUE
RUN long
FROM_HOST EXEC_HOST JOB_NAME
ghpcc06
c06b01
*;sleep 30
13
SUBMIT_TIME
May 13 11:14
Information Services, 00/00/2010
Which queues are available?
• BQUEUES lists available queue resources
$ bqueues
QUEUE_NAME
priority
interactive
short
parallel
long
gpu
condo_uma_haiyi
condo_grid
PRIO STATUS
MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
100 Open:Active
- - - - 0 0 0 0
90 Open:Active
- 5 - - 3 0 3 0
80 Open:Active 2854 - - - 1378 628 750 0
70 Open:Active 2854 - - - 6049 3195 2721 0
60 Open:Active
- - - - 1268 372 708 112
50 Open:Active
- - - - 0 0 0 0
30 Open:Active
- - - - 6912 6432 480 0
20 Open:Active
- - - - 0 0 0 0
14
Information Services, 00/00/2010
How does LSF schedule job?
• LSF selects which job to run next based on:
– Resources requirements of the applications
•
•
•
•
Queue
Memory
Time
Job requirements
– Current load conditions
– How many slots are available to backfill
15
Information Services, 00/00/2010
LSF Load
• lsload – Current load
$ lsload
HOST_NAME status r15s r1m r15m ut
pg ls it
tmp swp mem
c02b02
ok
87.8 78.5 68.1 93% 0.0 0 8320 775G 0M 434.7G
ghpcc-sgi
ok
335.0 516.5 519.5 29% 0.0 1 1167 1776M 7.9G 3.6T
Status – Accepting new jobs?
R15s/r1m/r15m – What is the load index currently?
Ut – CPU Utilization
Tmp – Temp space available
Swp – Swap space available
Mem – memory available
16
Information Services, 00/00/2010
LSF Hosts
• BHOSTS – show hosts and their status
$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
c10b08
closed
64 64
64
0
0
0
ghpcc-sgi
ok
512 489
489
0
0
0
Status – Shows if host is accepting new jobs
JL/U – Max Job slots available (per user) ALL
MAX – Max jobs slots per host
NJOBS – Jobs running on this host (slots used)
17
Information Services, 00/00/2010
User Priority
• BQUEUES can display priorities for a queue
$ bqueues -l short
SHARE_INFO_FOR: short/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME
RUN_TIME
SHARED_TOP 1
0.000
4172
291
790322880.0 921445248
.
.
.
ADJUST
0.000
HOSTS: ghpcc-sgi+10 c01+9 c02+8 c03+7 c04+6 c05+5 c06+4 c07+3 c08+2 c09+1
JOB_STARTER: source /lsf/9.1/linux2.6-glibc2.3x86_64/etc/GLOBAL_JOB_ENVIRONMENT ; bash -c '%USRCMD'
PREEMPTION: PREEMPTIVE[long] PREEMPTABLE[priority interactive]
MAX_JOB_PREEMPT: 4
MAX_TOTAL_TIME_PREEMPT: 360 minute(s)
18
Information Services, 00/00/2010
Environment Variables
# Example Script with variables from LSF
• LSF Variables (a few)
• LSB_JOBID
#!/bin/bash
#BSUB -o "/home/awr/%J.out"
#BSUB -e "/home/awr/%J.err“
– The LSF Batch job ID number.
• LSB_TASKID – Array task #
• LSB_HOSTS
– Hosts assigned to run on
• LSB_RESTART
echo "LSB_JOBID = $LSB_JOBID“
echo "LSB_JOBFILENAME =
$LSB_JOBFILENAME"
echo "LSB_HOSTS = $LSB_HOSTS"
echo "LSB_HOSTS = $LSB_HOSTS"
echo "LSB_QUEUE = $LSB_QUEUE"
echo "LSB_JOBNAME = $LSB_JOBNAME"
echo "LSB_RESTART = $LSB_RESTART"
echo "LSB_EXIT_PRE_ABORT =
$LSB_EXIT_PRE_ABORT"
echo "LSB_EXIT_REQUEUE =
$LSB_EXIT_REQUEUE"
echo "LSB_JOB_STARTER =
$LSB_JOB_STARTER"
echo "LSB_INTERACTIVE = $LSB_INTERACTIVE"
echo "LS_JOBPID = $LS_JOBPID"
echo "LS_SUBCWD = $LS_SUBCWD“
echo “LSB_TASKID = $ LSB_TASKID”
– Is this job restart-able?
19
Information Services, 00/00/2010
Env Variables, cont.
• LSB_EXIT_PRE_ABORT
– The value of this parameter can be used by a
queue or job-level pre-execution command so
that the command can exit with this value, if it
wants the job be aborted instead of being
requeued or executed.
20
Information Services, 00/00/2010
Job State(s)
• State Diagram of LSF jobs
• PEND
– waiting in the queue
• RUN
– dispatched to a host and running
• DONE
– terminated normally
21
Information Services, 00/00/2010
Job State(s), cont.
• A job remains pending until all conditions for its
execution are met. These conditions include:
– Start time specified by the user when the job is
submitted
– Load conditions on qualified hosts
– Time windows during which the job's queue can
dispatch jobs and qualified hosts accept jobs
– Job limits imposed by the configured policy for each
user, queue, and host
– Relative priority to other users and jobs
– Availability of the specified resources
22
Information Services, 00/00/2010
Job State(s), cont.
• A job may terminate abnormally for various
reasons. An abnormally terminated job goes
into EXIT state. The situations where a job
terminates abnormally include:
– The job is cancelled by its owner or the LSF
administrator while pending, or after being
dispatched
– The job is not able to be dispatched before it
reaches its termination deadline and thus is
aborted by LSF Batch
23
Information Services, 00/00/2010
Job State(s), cont.
• The situations where a job terminates
abnormally include:
– The job fails to start successfully. For example, the
wrong executable is specified by the user when
the job is submitted
– The job crashes during execution
24
Information Services, 00/00/2010
Job State(s), cont.
• Jobs may also be suspended at any time. A job
can be suspended by its owner, by the LSF
administrator or by the LSF Batch system.
There are three different states for suspended
jobs:
– PSUSP suspended by its owner or the LSF
administrator while in PEND state
– USUSP suspended by its owner or the LSF
administrator after being dispatched
– SSUSP suspended by the LSF Batch system after
being dispatched
25
Information Services, 00/00/2010
Queue Setup
• Fairshare with queue based pre-emption possible
• Short – Up to 12 hours
– Can pre-empt other queues
• Long > 12 hours
• Interactive – Up to 12 hours
– Can pre-empt other queues
• Parallel
– Used for true parallel jobs
26
Information Services, 00/00/2010
Parallel versus single
• What is a single job
– Uses one core/slot and runs in a single namespace
• What is a parallel job
– Uses more than one core/slot and runs in a single
or multiple namespace context
• What is a threaded job
– A parallel job which runs in a single namespace
• What is an MPI job
– A parallel job which runs in multiple namespaces
27
Information Services, 00/00/2010
Resource
Management
Information Services, 00/00/2010
Why manage resources?
• Accurately listing requirements means your
job gets dispatched faster
• Does your job scale well?
– Doubling the core count may not halve run time
• Can it be distributed?
29
Information Services, 00/00/2010
Resource Requests
• Submitting a job with basic resources:
• Memory
– Using the -R option
• Request ~16GB Ram
$ bsub -R “rusage[mem=16000]” script.sh
30
Information Services, 00/00/2010
Resource Requests, cont.
• How much time does your job need?
– Estimate the amount of time at first
– If you know it takes 1 hour for 1 CHR, and you are
analyzing 4 CHR, then figure 5 hours and see
where things end up
• Used the –W H:M resource request:
– #BSUB -W 5:00
31
Information Services, 00/00/2010
Resource Requests, cont.
• Which queue do we want to run in?
– Recall the different types of queues and select
– Default is long
• Using –q {queue name} we select queue
• #BSUB -q short
32
Information Services, 00/00/2010
Resource Requests, cont.
• How do I run a threaded application?
– All cores/slots need to be on the same host
– All memory is in the same name space
• Use the -R “span” resource request to make
sure that you have ALL cores on ONE host as:
– #BSUB -R "span[hosts=1]"
33
Information Services, 00/00/2010
Resource Requests, Cont.
• How do I run an MPI job?
– Cores/slots can reside on any host
• Optimal setup is to run on same host as much as
possible of course
– Namespace is independent
– Load the MPI module
• -R “span” with ptile
• Example: 12 slots with 6 slots for each host
– #BSUB -n 12
– #BSUB -R "span[ptile=6]“
34
Information Services, 00/00/2010
Ex: LSF Submission with resources
Say we have a need for 15GB of available RAM,
and 4 slots/CPUs for our job to run. We still
use the bsub command but with options:
• -R rusage[mem=X] – Request X MB of RAM
• -n X – Request X Slots (cores)
– Job slots are not guaranteed to be on the same
system unless we tell LSF for them to be
$ bsub -R rusage[mem=15360] –n 4 ./vmd.sh
35
Information Services,
5/14/2014
Measuring resources used
• Job report contains memory and CPU used
• Also contains the amount of time the job was
pending and running
• GHPCC optimized for short (<12H) jobs
• Test software with larger requirements and
use that as a gauge
• Run production with slightly scaled up
numbers
36
Information Services, 00/00/2010
LSF based array of jobs
• Say you have 1000 files that you need to
process and they're numbered sequentially
file 1 through file 1000
• bsub -W 1:0 -R "rusage[mem=1024]" \
-J "myarray[1-1000]" \
"process file.\$LSB_JOBINDEX"
• This will run jobs 1..1000, one slot per
37
Information Services, 00/00/2010
LSF based resource requests
• Non-threaded job Example:
– A job requiring: 4 cores, a run time of 50 minutes,
and 1GB of memory per job slot:
– bsub -q short -n 4 -W 0:50 \
–R "rusage[mem=1024]" \
-R "span[hosts=1]“
./myparalleljob.sh
• Here we run in the short queue as we are
looking for singleton based jobs
38
Information Services, 00/00/2010
Resource Requests, cont.
• Threaded jobs
• A job requiring: 4 cores, a run time of 50 minutes,
and 1GB of memory per job slot:
• bsub -q parallel -n 4 -W 0:50 \
-R "span[hosts=1]" -R \
"rusage[mem=1024]" ./mytheadedjob.sh
• Here we request the parallel queue as this job
although not running on separate hosts are
considered parallel in nature
39
Information Services, 00/00/2010
Resource Requests, cont.
• GPU resource requests
– Single GPU:
• #BSUB -q gpu -a gpuexcl_p
– Both GPUs on a host:
• #BSUB -q gpu -a gpuexcl2_p
– Resource requests for memory, and time are still the
same here
– Note, these hosts are X86 – Intel and not AMD nodes
– If you are using CUDA don’t forget to load the
module
40
Information Services, 00/00/2010
Using the
right queue
Information Services, 00/00/2010
Why use the correct queue?
•
•
•
•
Match requirements to resources
Jobs dispatch quicker
Better for entire cluster
Help GHPCC staff determine when new
resources are needed
42
Information Services, 00/00/2010
Short
• Cluster currently optimized for short jobs
• Easier to get jobs dispatched
• If your job can be split into smaller chunks, do
it
• Limited to 80% of cores
43
Information Services, 00/00/2010
Parallel
• For jobs that can be split across multiple
systems
• Usually MPI jobs
• Usually fast dispatch
• Doesn’t get suspended
• Limited to 80% of cores
44
Information Services, 00/00/2010
Long
•
•
•
•
Don’t be afraid to submit long jobs
Usually lacking in resources
Run for 30 days
Parallel threaded applications that run on the
same node “hosts[span=1]”
45
Information Services, 00/00/2010
Interactive
•
•
•
•
•
•
Get your own CPU and memory allocation
Great for compiling software
Or downloading data
Or testing software
Limited to 5 concurrent jobs per user
Limited to 8 hours per job
46
Information Services, 00/00/2010
Scripting
Jobs
Information Services, 00/00/2010
Using bash with LSF
•
•
•
•
Your shell script can include LSF commands
Or write a separate wrapper
Reduce redundancy
No errors in submitting
48
Information Services, 00/00/2010
Submitting scripts to LSF
• You can include bsub options in a shell script
at the top of the file
• Prefix each line with #BSUB
• One per line
• Script will run as normal without LSF so you
can test
• Rest of script is a normal shell script
49
Information Services, 00/00/2010
Demo
#!/bin/bash
#BSUB -W 00:10
#BSUB -n 1
#BSUB -R "rusage[mem=1024]"
#BSUB -J "myTask[1-80]”
#BSUB -o logs/out.%J.%I
echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX"
50
Information Services, 00/00/2010
Passing options to shell scripts
• This works:
bsub myjob.sh opt1 opt2 opt3
• This won’t:
bsub < myjob.sh opt1 opt2 opt3
• Why and now what?
51
Information Services, 00/00/2010
Passing options to shell scripts
• Write wrapper for bsub that passes arguments
#!/bin/bash
bsub –q short myjob.sh $1 $2 $3
• Then run ./mywrapper.sh opt1 opt2 opt3
52
Information Services, 00/00/2010
Exit codes
•
•
•
•
•
Non-zero exit = EXIT
Zero exit = DONE
Tracked separately by LSF, may be in metrics
Not all non-zero exit jobs really failed
Up to you to track jobs
– But contact us if you have questions!
53
Information Services, 00/00/2010
Tips and
Tricks
Information Services, 00/00/2010
LSF summary commands
• We have created some convenient tools which display LSF
and other system command output
• Must load hpctools/1.0.0 module
• cju
[awr@ghpcc06 ~]$ cju
Username #JOBS #Cores in use #Jobs Queued
Full Name
Phone #
PI (campus_PI)
------------------------------------------------------------------------------------------------------------------------------------hg49a
3 jobs 2208 core(s)
0 Jobs Q/Susp
Gao,Haiying
xxx-xxx-xxxx uma_haiying_gao
fm19w
2 jobs 256 core(s)
0 Jobs Q/Susp
Massi,Francesca xxx-xxx-xxxx umw_francesca_massi
lb33d
3 jobs 120 core(s)
0 Jobs Q/Susp
Berard,Leandre xx-xxx-xxxx umd_mehdi_raessi
nnk60a
82 jobs 82 core(s)
0 Jobs Q/Susp
Khan,Navaid
xxx-xxx-xxxx uma_peter_monson
sc45a
13 jobs 13 core(s)
0 Jobs Q/Susp
Chien,Szu-Chia xxx-xxx-xxxx uma_peter_monson
cjh
1 jobs 1 core(s)
4 Jobs Q/Susp
Hull,Chris
xxx-xxx-xxxx umw_rc
Total cores used: 2679
Total cores available: 2336
55
Information Services,
5/14/2014
LSF summary commands
• More summary commands:
$ ac – Available CPUs
$ tc – Total CPUs
$ uc – Used CPUs
$ home_used
• See our wiki for real time stats:
http://wiki.umassrc.org/
56
Information Services,
5/14/2014
HPC Storage
• Disk usage best practices
• Archive your data
– Make backups of your data on mid-long term storage
• Use local storage if possible
– Local storage always faster than network
• Don’t use farline for cluster processing
57
Information Services,
5/14/2014
HPC Best practices
• When submitting a large number of jobs
please consider:
– Single CPU jobs versus multi CPU Jobs
– Correct amount of memory for your job
– Job Arrays
– Job dependencies
58
Information Services,
5/14/2014
Job dependencies
• Have jobs not start until specified events
happen
• Lots of options for event: done, exit
• Jobs in queue, but not eligible to dispatch
until dependencies are met
• Be sure to track and remove ‘dead’ jobs
59
Information Services, 00/00/2010
Job dependency examples
Wait until jobid 174333 successfully completes:
$ bsub sleep 600
$ bsub -w "done(174333)" echo "hello“
Wait until all jobs in array have finished:
$ bsub -J “myjob[1-100]” sleep 60
$ bsub -w “done(myjob[*])” echo “hello”
60
Information Services, 00/00/2010
HPC Best practices cont.
• The earlier your jobs are submitted the earlier
your job will gain needed LSF resources.
• Re-direct all LSF output to one directory for
convenience
• Add the following to your LSF / Job directives:
(redirects stdout/stderr)
#BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.out
#BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.%I.out
61
Information Services,
5/14/2014
HPC Best practices cont.
LSF Queues and policies
• Fair share attempts to equalize CPU (slot) resources for
Labs and users at job submission.
• The priority of a job is calculated in relation to other
submitted jobs. The priority for jobs will change as
jobs complete and job slots become available
• All labs start with an equal weight
• Each lab member shares in this weight when
submitting jobs
• Weights are measured from job submissions per user
and per lab
• Weights are based on CPU time used and a decay time
62
Information Services,
5/14/2014
Pre-exec and post-exec
• Run commands before and after job
completes
• Great for copying data to and from local disk
• -E for pre_exec
• -Ep for post_exec
63
Information Services, 00/00/2010
Using pre-exec and post-exec
• During your job execution
– Pre exec:
• Create a folder layout /tmp/jobid_$LSB_JOBID
• Transfer data files if needed
– Post exec:
• Transfer data files back to central storage
• remove files which you no longer require
• Make offsite backup copy
64
Information Services,
5/14/2014
Q&A
• Don’t be shy – everyone has a different work
flow and we want to help you
65
Information Services, 00/00/2010

Download Report