High Performance Computing (HPC) Using IBM LSF Presented by: Al Ritacco, Mark Komarinski Research Computing UMASS Medical School Information Services, 09/17/2012 Agenda • • • • • • Introduction to LSF Resource management Using the right queue(s) Shell scripting and LSF Tips and tricks Q&A 2 Information Services, 00/00/2010 Intro to LSF Information Services, 00/00/2010 HPC Job submissions with LSF • How do we efficiently create and submit jobs? • With LSF we can take advantage of a scheduling engine to distribute jobs in the most optimal and systematic method • Submit 1000’s of jobs without worry about where the jobs will/may run or when. • You provide LSF with job requirements, and LSF figures out where to put your jobs 4 Information Services, 5/14/2014 Submitting a job using LSF • How do we submit jobs to LSF? – We use the bsub command (batch submit) to request our job run on the cluster. • Job results returned via e-mail when complete • $PATH is important when submitting jobs • Job submissions must include memory and time limits. • If time or memory limits are exceeded, job will be terminated 5 Information Services, 5/14/2014 HPC Job submissions with LSF • The following are required to run a job at MGHPCC – How long your job requires to run, in H:M • A default of one hour will be set if no time is requested – How much memory your job requires • 1GB of RAM will be default per slot – Defaults will be provided if you do not give them • 1 hour will be default time for a job • 1 core will be made a available for your job • You probably won’t like these defaults 6 Information Services, 00/00/2010 Where does the output go? Output file options • Example LSF options: • -o $HOME/LSF_jobs_output/LSF_job.$LSB_JOBID.out – Send STDOUT to this file rather than send via e-mail • -e $HOME/LSF_jobs_output/LSF_job.$LSB_JOBID.err – Send STDERR to this file rather than send via e-mail 7 Information Services, 5/14/2014 Submitting jobs via BASH ---- job.sh --#!/bin/bash echo "Test” sleep 30 $ bsub < job.sh Job does not list memory required, please specify memory needed per slot (per cpu requested) in megabytes with "-R rusage[mem=####]" Setting memory required to default of 1 gigabyte per slot: -R rusage[mem=1024] Job runtime not indicated, please specify job runtime in minutes with "-W ##" Setting wallclock time to default of 60 minutes: -W 60 Job <174223> is submitted to default queue <long>. 8 Information Services, 00/00/2010 Checking the status of job(s) • BJOBS – Shows your running jobs $ bjobs JOBID USER STAT QUEUE 174224 awr FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME PEND long ghpcc06 *;sleep 30 May 13 11:14 $ bjobs JOBID USER STAT QUEUE 174224 awr RUN long FROM_HOST EXEC_HOST JOB_NAME ghpcc06 c06b01 *;sleep 30 SUBMIT_TIME May 13 11:14 $ bjobs -u all -r # Shows ALL users running jobs $ bjobs -a # Show ALL jobs (including completed) JOBID USER STAT QUEUE 174224 awr DONE long FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME ghpcc06 c06b01 *;sleep 30 May 13 11:14 9 Information Services, 00/00/2010 Killing a running job • BKILL – Stops a job (or send a specific signal) $ bkill {jobid} – stop a specific jobid $ bkill 0 # stops all your jobs $ bkill -q long 0 # stop all your jobs in long queue 10 Information Services, 00/00/2010 Peeking at a running job • BPEEK – view output (stderr/out) running job • Note: bpeek does not fflush() $ bpeek 4587 Output 11 Information Services, 00/00/2010 Suspending Jobs • BSTOP – put a job in suspend state $ bstop 174224 # put job on hold JOBID USER 174224 awr STAT QUEUE USUP long FROM_HOST EXEC_HOST JOB_NAME ghpcc06 c06b01 *;sleep 30 12 SUBMIT_TIME May 13 11:14 Information Services, 00/00/2010 Resume a suspended Job • BRESUME JOBID – Resume job $ bresume 174224 JOBID USER 174224 awr STAT QUEUE RUN long FROM_HOST EXEC_HOST JOB_NAME ghpcc06 c06b01 *;sleep 30 13 SUBMIT_TIME May 13 11:14 Information Services, 00/00/2010 Which queues are available? • BQUEUES lists available queue resources $ bqueues QUEUE_NAME priority interactive short parallel long gpu condo_uma_haiyi condo_grid PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP 100 Open:Active - - - - 0 0 0 0 90 Open:Active - 5 - - 3 0 3 0 80 Open:Active 2854 - - - 1378 628 750 0 70 Open:Active 2854 - - - 6049 3195 2721 0 60 Open:Active - - - - 1268 372 708 112 50 Open:Active - - - - 0 0 0 0 30 Open:Active - - - - 6912 6432 480 0 20 Open:Active - - - - 0 0 0 0 14 Information Services, 00/00/2010 How does LSF schedule job? • LSF selects which job to run next based on: – Resources requirements of the applications • • • • Queue Memory Time Job requirements – Current load conditions – How many slots are available to backfill 15 Information Services, 00/00/2010 LSF Load • lsload – Current load $ lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem c02b02 ok 87.8 78.5 68.1 93% 0.0 0 8320 775G 0M 434.7G ghpcc-sgi ok 335.0 516.5 519.5 29% 0.0 1 1167 1776M 7.9G 3.6T Status – Accepting new jobs? R15s/r1m/r15m – What is the load index currently? Ut – CPU Utilization Tmp – Temp space available Swp – Swap space available Mem – memory available 16 Information Services, 00/00/2010 LSF Hosts • BHOSTS – show hosts and their status $ bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV c10b08 closed 64 64 64 0 0 0 ghpcc-sgi ok 512 489 489 0 0 0 Status – Shows if host is accepting new jobs JL/U – Max Job slots available (per user) ALL MAX – Max jobs slots per host NJOBS – Jobs running on this host (slots used) 17 Information Services, 00/00/2010 User Priority • BQUEUES can display priorities for a queue $ bqueues -l short SHARE_INFO_FOR: short/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME SHARED_TOP 1 0.000 4172 291 790322880.0 921445248 . . . ADJUST 0.000 HOSTS: ghpcc-sgi+10 c01+9 c02+8 c03+7 c04+6 c05+5 c06+4 c07+3 c08+2 c09+1 JOB_STARTER: source /lsf/9.1/linux2.6-glibc2.3x86_64/etc/GLOBAL_JOB_ENVIRONMENT ; bash -c '%USRCMD' PREEMPTION: PREEMPTIVE[long] PREEMPTABLE[priority interactive] MAX_JOB_PREEMPT: 4 MAX_TOTAL_TIME_PREEMPT: 360 minute(s) 18 Information Services, 00/00/2010 Environment Variables # Example Script with variables from LSF • LSF Variables (a few) • LSB_JOBID #!/bin/bash #BSUB -o "/home/awr/%J.out" #BSUB -e "/home/awr/%J.err“ – The LSF Batch job ID number. • LSB_TASKID – Array task # • LSB_HOSTS – Hosts assigned to run on • LSB_RESTART echo "LSB_JOBID = $LSB_JOBID“ echo "LSB_JOBFILENAME = $LSB_JOBFILENAME" echo "LSB_HOSTS = $LSB_HOSTS" echo "LSB_HOSTS = $LSB_HOSTS" echo "LSB_QUEUE = $LSB_QUEUE" echo "LSB_JOBNAME = $LSB_JOBNAME" echo "LSB_RESTART = $LSB_RESTART" echo "LSB_EXIT_PRE_ABORT = $LSB_EXIT_PRE_ABORT" echo "LSB_EXIT_REQUEUE = $LSB_EXIT_REQUEUE" echo "LSB_JOB_STARTER = $LSB_JOB_STARTER" echo "LSB_INTERACTIVE = $LSB_INTERACTIVE" echo "LS_JOBPID = $LS_JOBPID" echo "LS_SUBCWD = $LS_SUBCWD“ echo “LSB_TASKID = $ LSB_TASKID” – Is this job restart-able? 19 Information Services, 00/00/2010 Env Variables, cont. • LSB_EXIT_PRE_ABORT – The value of this parameter can be used by a queue or job-level pre-execution command so that the command can exit with this value, if it wants the job be aborted instead of being requeued or executed. 20 Information Services, 00/00/2010 Job State(s) • State Diagram of LSF jobs • PEND – waiting in the queue • RUN – dispatched to a host and running • DONE – terminated normally 21 Information Services, 00/00/2010 Job State(s), cont. • A job remains pending until all conditions for its execution are met. These conditions include: – Start time specified by the user when the job is submitted – Load conditions on qualified hosts – Time windows during which the job's queue can dispatch jobs and qualified hosts accept jobs – Job limits imposed by the configured policy for each user, queue, and host – Relative priority to other users and jobs – Availability of the specified resources 22 Information Services, 00/00/2010 Job State(s), cont. • A job may terminate abnormally for various reasons. An abnormally terminated job goes into EXIT state. The situations where a job terminates abnormally include: – The job is cancelled by its owner or the LSF administrator while pending, or after being dispatched – The job is not able to be dispatched before it reaches its termination deadline and thus is aborted by LSF Batch 23 Information Services, 00/00/2010 Job State(s), cont. • The situations where a job terminates abnormally include: – The job fails to start successfully. For example, the wrong executable is specified by the user when the job is submitted – The job crashes during execution 24 Information Services, 00/00/2010 Job State(s), cont. • Jobs may also be suspended at any time. A job can be suspended by its owner, by the LSF administrator or by the LSF Batch system. There are three different states for suspended jobs: – PSUSP suspended by its owner or the LSF administrator while in PEND state – USUSP suspended by its owner or the LSF administrator after being dispatched – SSUSP suspended by the LSF Batch system after being dispatched 25 Information Services, 00/00/2010 Queue Setup • Fairshare with queue based pre-emption possible • Short – Up to 12 hours – Can pre-empt other queues • Long > 12 hours • Interactive – Up to 12 hours – Can pre-empt other queues • Parallel – Used for true parallel jobs 26 Information Services, 00/00/2010 Parallel versus single • What is a single job – Uses one core/slot and runs in a single namespace • What is a parallel job – Uses more than one core/slot and runs in a single or multiple namespace context • What is a threaded job – A parallel job which runs in a single namespace • What is an MPI job – A parallel job which runs in multiple namespaces 27 Information Services, 00/00/2010 Resource Management Information Services, 00/00/2010 Why manage resources? • Accurately listing requirements means your job gets dispatched faster • Does your job scale well? – Doubling the core count may not halve run time • Can it be distributed? 29 Information Services, 00/00/2010 Resource Requests • Submitting a job with basic resources: • Memory – Using the -R option • Request ~16GB Ram $ bsub -R “rusage[mem=16000]” script.sh 30 Information Services, 00/00/2010 Resource Requests, cont. • How much time does your job need? – Estimate the amount of time at first – If you know it takes 1 hour for 1 CHR, and you are analyzing 4 CHR, then figure 5 hours and see where things end up • Used the –W H:M resource request: – #BSUB -W 5:00 31 Information Services, 00/00/2010 Resource Requests, cont. • Which queue do we want to run in? – Recall the different types of queues and select – Default is long • Using –q {queue name} we select queue • #BSUB -q short 32 Information Services, 00/00/2010 Resource Requests, cont. • How do I run a threaded application? – All cores/slots need to be on the same host – All memory is in the same name space • Use the -R “span” resource request to make sure that you have ALL cores on ONE host as: – #BSUB -R "span[hosts=1]" 33 Information Services, 00/00/2010 Resource Requests, Cont. • How do I run an MPI job? – Cores/slots can reside on any host • Optimal setup is to run on same host as much as possible of course – Namespace is independent – Load the MPI module • -R “span” with ptile • Example: 12 slots with 6 slots for each host – #BSUB -n 12 – #BSUB -R "span[ptile=6]“ 34 Information Services, 00/00/2010 Ex: LSF Submission with resources Say we have a need for 15GB of available RAM, and 4 slots/CPUs for our job to run. We still use the bsub command but with options: • -R rusage[mem=X] – Request X MB of RAM • -n X – Request X Slots (cores) – Job slots are not guaranteed to be on the same system unless we tell LSF for them to be $ bsub -R rusage[mem=15360] –n 4 ./vmd.sh 35 Information Services, 5/14/2014 Measuring resources used • Job report contains memory and CPU used • Also contains the amount of time the job was pending and running • GHPCC optimized for short (<12H) jobs • Test software with larger requirements and use that as a gauge • Run production with slightly scaled up numbers 36 Information Services, 00/00/2010 LSF based array of jobs • Say you have 1000 files that you need to process and they're numbered sequentially file 1 through file 1000 • bsub -W 1:0 -R "rusage[mem=1024]" \ -J "myarray[1-1000]" \ "process file.\$LSB_JOBINDEX" • This will run jobs 1..1000, one slot per 37 Information Services, 00/00/2010 LSF based resource requests • Non-threaded job Example: – A job requiring: 4 cores, a run time of 50 minutes, and 1GB of memory per job slot: – bsub -q short -n 4 -W 0:50 \ –R "rusage[mem=1024]" \ -R "span[hosts=1]“ ./myparalleljob.sh • Here we run in the short queue as we are looking for singleton based jobs 38 Information Services, 00/00/2010 Resource Requests, cont. • Threaded jobs • A job requiring: 4 cores, a run time of 50 minutes, and 1GB of memory per job slot: • bsub -q parallel -n 4 -W 0:50 \ -R "span[hosts=1]" -R \ "rusage[mem=1024]" ./mytheadedjob.sh • Here we request the parallel queue as this job although not running on separate hosts are considered parallel in nature 39 Information Services, 00/00/2010 Resource Requests, cont. • GPU resource requests – Single GPU: • #BSUB -q gpu -a gpuexcl_p – Both GPUs on a host: • #BSUB -q gpu -a gpuexcl2_p – Resource requests for memory, and time are still the same here – Note, these hosts are X86 – Intel and not AMD nodes – If you are using CUDA don’t forget to load the module 40 Information Services, 00/00/2010 Using the right queue Information Services, 00/00/2010 Why use the correct queue? • • • • Match requirements to resources Jobs dispatch quicker Better for entire cluster Help GHPCC staff determine when new resources are needed 42 Information Services, 00/00/2010 Short • Cluster currently optimized for short jobs • Easier to get jobs dispatched • If your job can be split into smaller chunks, do it • Limited to 80% of cores 43 Information Services, 00/00/2010 Parallel • For jobs that can be split across multiple systems • Usually MPI jobs • Usually fast dispatch • Doesn’t get suspended • Limited to 80% of cores 44 Information Services, 00/00/2010 Long • • • • Don’t be afraid to submit long jobs Usually lacking in resources Run for 30 days Parallel threaded applications that run on the same node “hosts[span=1]” 45 Information Services, 00/00/2010 Interactive • • • • • • Get your own CPU and memory allocation Great for compiling software Or downloading data Or testing software Limited to 5 concurrent jobs per user Limited to 8 hours per job 46 Information Services, 00/00/2010 Scripting Jobs Information Services, 00/00/2010 Using bash with LSF • • • • Your shell script can include LSF commands Or write a separate wrapper Reduce redundancy No errors in submitting 48 Information Services, 00/00/2010 Submitting scripts to LSF • You can include bsub options in a shell script at the top of the file • Prefix each line with #BSUB • One per line • Script will run as normal without LSF so you can test • Rest of script is a normal shell script 49 Information Services, 00/00/2010 Demo #!/bin/bash #BSUB -W 00:10 #BSUB -n 1 #BSUB -R "rusage[mem=1024]" #BSUB -J "myTask[1-80]” #BSUB -o logs/out.%J.%I echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" 50 Information Services, 00/00/2010 Passing options to shell scripts • This works: bsub myjob.sh opt1 opt2 opt3 • This won’t: bsub < myjob.sh opt1 opt2 opt3 • Why and now what? 51 Information Services, 00/00/2010 Passing options to shell scripts • Write wrapper for bsub that passes arguments #!/bin/bash bsub –q short myjob.sh $1 $2 $3 • Then run ./mywrapper.sh opt1 opt2 opt3 52 Information Services, 00/00/2010 Exit codes • • • • • Non-zero exit = EXIT Zero exit = DONE Tracked separately by LSF, may be in metrics Not all non-zero exit jobs really failed Up to you to track jobs – But contact us if you have questions! 53 Information Services, 00/00/2010 Tips and Tricks Information Services, 00/00/2010 LSF summary commands • We have created some convenient tools which display LSF and other system command output • Must load hpctools/1.0.0 module • cju [awr@ghpcc06 ~]$ cju Username #JOBS #Cores in use #Jobs Queued Full Name Phone # PI (campus_PI) ------------------------------------------------------------------------------------------------------------------------------------hg49a 3 jobs 2208 core(s) 0 Jobs Q/Susp Gao,Haiying xxx-xxx-xxxx uma_haiying_gao fm19w 2 jobs 256 core(s) 0 Jobs Q/Susp Massi,Francesca xxx-xxx-xxxx umw_francesca_massi lb33d 3 jobs 120 core(s) 0 Jobs Q/Susp Berard,Leandre xx-xxx-xxxx umd_mehdi_raessi nnk60a 82 jobs 82 core(s) 0 Jobs Q/Susp Khan,Navaid xxx-xxx-xxxx uma_peter_monson sc45a 13 jobs 13 core(s) 0 Jobs Q/Susp Chien,Szu-Chia xxx-xxx-xxxx uma_peter_monson cjh 1 jobs 1 core(s) 4 Jobs Q/Susp Hull,Chris xxx-xxx-xxxx umw_rc Total cores used: 2679 Total cores available: 2336 55 Information Services, 5/14/2014 LSF summary commands • More summary commands: $ ac – Available CPUs $ tc – Total CPUs $ uc – Used CPUs $ home_used • See our wiki for real time stats: http://wiki.umassrc.org/ 56 Information Services, 5/14/2014 HPC Storage • Disk usage best practices • Archive your data – Make backups of your data on mid-long term storage • Use local storage if possible – Local storage always faster than network • Don’t use farline for cluster processing 57 Information Services, 5/14/2014 HPC Best practices • When submitting a large number of jobs please consider: – Single CPU jobs versus multi CPU Jobs – Correct amount of memory for your job – Job Arrays – Job dependencies 58 Information Services, 5/14/2014 Job dependencies • Have jobs not start until specified events happen • Lots of options for event: done, exit • Jobs in queue, but not eligible to dispatch until dependencies are met • Be sure to track and remove ‘dead’ jobs 59 Information Services, 00/00/2010 Job dependency examples Wait until jobid 174333 successfully completes: $ bsub sleep 600 $ bsub -w "done(174333)" echo "hello“ Wait until all jobs in array have finished: $ bsub -J “myjob[1-100]” sleep 60 $ bsub -w “done(myjob[*])” echo “hello” 60 Information Services, 00/00/2010 HPC Best practices cont. • The earlier your jobs are submitted the earlier your job will gain needed LSF resources. • Re-direct all LSF output to one directory for convenience • Add the following to your LSF / Job directives: (redirects stdout/stderr) #BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.out #BSUB -o $HOME/LSF_jobs_output/LSF_job.%J.%I.out 61 Information Services, 5/14/2014 HPC Best practices cont. LSF Queues and policies • Fair share attempts to equalize CPU (slot) resources for Labs and users at job submission. • The priority of a job is calculated in relation to other submitted jobs. The priority for jobs will change as jobs complete and job slots become available • All labs start with an equal weight • Each lab member shares in this weight when submitting jobs • Weights are measured from job submissions per user and per lab • Weights are based on CPU time used and a decay time 62 Information Services, 5/14/2014 Pre-exec and post-exec • Run commands before and after job completes • Great for copying data to and from local disk • -E for pre_exec • -Ep for post_exec 63 Information Services, 00/00/2010 Using pre-exec and post-exec • During your job execution – Pre exec: • Create a folder layout /tmp/jobid_$LSB_JOBID • Transfer data files if needed – Post exec: • Transfer data files back to central storage • remove files which you no longer require • Make offsite backup copy 64 Information Services, 5/14/2014 Q&A • Don’t be shy – everyone has a different work flow and we want to help you 65 Information Services, 00/00/2010
© Copyright 2024 ExpyDoc