GHC(STG,Cmm,asm) illustrated for hardware persons exploring some mental models and implementations Takenobu T. Rev. 0.01.1 WIP “Any sufficiently advanced technology is indistinguishable from magic.” Arthur C. Clarke NOTE - This is not an official document by the ghc development team. - Please don't forget “semantics”. It's very important. - This is written for ghc 7.8 (and ghc 7.10). Contents - Executable binary - Compile steps - Runtime System - Development languages - Machine layer/models - STG-machine - Heap objects in STG-machine - STG-machine evaluation - Pointer tagging - Thunk and update - Allocate and free heap objects - STG - C land interface - Thread - Thread context switch - Creating main and sub threads - Thread migration - Heap and Threads - Threads and GC - Bound thread - Spark - Mvar - Software transactional memory - FFI - IO and FFI - IO manager - Bootstrap - References Executable binary The GHC = Compiler + Runtime System (RTS) Haskell source (.hs) GHC (compile) object (.o) RuntimeSystem (libHsRts.o) libraries (GHC.Base, ...) GHC (link) Executable binary including the RTS (* static link case) References : [1], [C1], [C3], [C10], [C19], [S7] Compile steps GHC transitions between five representations Haskell language GHC compile steps Core language STG language Cmm language Assembly language (native or llvm) each intermediate code can be dumped by : % ghc -ddump-parsed % ghc -ddump-rn % ghc -ddump-ds % ghc -ddump-simpl % ghc -ddump-prep % ghc -ddump-stg % ghc -ddump-cmm % ghc -ddump-opt-cmm % ghc -ddump-llvm % ghc -ddump-asm References : [1], [C3], [C4], [9], [C5], [C6], [C7], [C8[], [S7], [S8] Runtime System Generated binary includes the RTS Haskell user code executable binary Runtime System software hardware Library (* static link case) OS (Linux, FreeBSD, Win, ...) Physical Processor (x86, ARM, ...) References : [C10], [9] Runtime System includes ... Runtime System Storage Manager Byte-code interpreter Software Transactional Memory User space Scheduler Profiling ... References : [C10], [8], [9], [5], [17], [S13] Development languages The GHC is developed by some languages compiler runtime system ( $(TOP)/compiler/*) ( $(TOP)/rts/*) Haskell + Alex (lex) Happy (yacc) Cmm (C--) Assembly C + Cmm Assembly library ( $(TOP)/libraries/*) Haskell + C References : [C2] Machine layer/models Machine layer STG-machine (Abstract machine) HEC - Haskell Execution Context (Capability, Virtual processor) Physical Processor (x86, ARM, ...) Each Haskell code is executed in STG semantics. References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11] Machine layer STG Registers STG-machine (Abstract machine) Stack Heap Static region Heap Static region BaseReg R1 Hp Sp : Register table HEC - Haskell Execution Context (Capability, Virtual processor) BaseReg R1 Hp : Registers Physical Processor (x86, ARM, ...) Memory ebx esi edi ebp : References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11] Runtime system and HEC user code Haskell user code by STG-semantics STG-machine runtime system HECs user space Runtime System OS Threads (native threads, kernel threads) OS Process OS OS (Linux, FreeBSD, Win, ...) supervisor space (kernel space) hardware Physical Processor (x86, ARM, ...) References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11] many HECs Multi HECs can be generated by compile and runtime options : $ ghc -rtsopts -threaded $ ./xxx +RTS -N4 HEC (Capability, Virtual processor) Task (Worker Thread) HEC HEC HEC HEC Tasks (abstract OS Thread) OS Threads OS Process software hardware OS (Linux, FreeBSD, Win, ...) Physical Processor (x86, ARM, ...) References : [1], [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] HEC (Capability) data structure [rts/Capability.h] (ghc 7.8) struct Capability_ { StgFunTable f; StgRegTable r; register table nat no; Task *running_task; rtsBool in_haskell; nat idle; run queue rtsBool disabled; StgTSO *run_queue_hd; StgTSO *run_queue_tl; InCall *suspended_ccalls; bdescr **mut_lists; bdescr **saved_mut_lists; bdescr *pinned_object_block; bdescr *pinned_object_blocks; int context_switch; int interrupt; #if defined(THREADED_RTS) Task *spare_workers; nat n_spare_workers; Mutex lock; Task *returning_tasks_hd; Task *returning_tasks_tl; Message *inbox; SparkPool *sparks; SparkCounters spark_stats; #endif } W_ total_allocated; StgTVarWatchQueue *free_tvar_watch_queues; StgInvariantCheckQueue *free_invariant_check_queues; StgTRecChunk *free_trec_chunks; StgTRecHeader *free_trec_headers; nat transaction_tokens; Each HEC (Capability) has a register table and a run queue and ... Each HEC (Capability) is initialized at initCapabilities [rts/Capability.c] References : [S15], [S16], [C11], [C17] STG-machine The STG-machine consists of three parts STG Registers BaseReg R1 Hp HpLim Sp SpLim : Stack Sp grows downwards SpLim Heap HpLim Hp grows upwards References : [2], [C15], [C11], [C12] STG-machine is mapped to physical processor logical view STG Registers physical view physical register (x86 example) Register table BaseReg R1 Hp Sp ebx esi edi ebp ebx esi edi : : ... References : [C15], [S1], [S2] STG-machine is mapped to physical processor logical view physical view Heap memory Stack TSO Thread State Object Heap Stack A stack and a TSO object are in the heap. The stack is stored separately from the TSO for size extension and GC. References : [C11], [C12], [S16], [S5] TSO data structure [includes/rts/storage/TSO.h] (ghc 7.8) typedef struct StgTSO_ { StgHeader header; struct StgTSO_* _link; struct StgTSO_* global_link; struct StgStack_ *stackobj; link to stack object StgWord16 what_next; StgWord16 why_blocked; StgWord32 flags; StgTSOBlockInfo block_info; StgThreadID id; StgWord32 saved_errno; StgWord32 dirty; struct InCall_* bound; struct Capability_* cap; struct StgTRecHeader_ * trec; struct MessageThrowTo_ * blocked_exceptions; struct StgBlockingQueue_ *bq; StgWord32 tot_stack_size; } *StgTSOPtr; A TSO object is only ~17words + stack. Lightweight! References : [S5] Heap objects in STG-machine Every heap object is represented uniformly header payload info ptr info table meta data entry code actual machine code Closure (header + payload) + Info Table + Entry Code References : [C11], [S3], [S4], [S6], [2] Heap object (closure) logical view header payload info ptr physical view heap memory payload1 payload0 info ptr info table static memory entry code info table entry code References : [C11], [S3], [C9], [C8], [2] Closure examples : Char, Int 'a' :: Char header C# info ptr 7 :: Int payload header 'a‘# I# GHC.Types.C#_static_info layout : 0_1 type : CONSTR bitmap : inc %esi jmp *0x0(%ebp) info ptr payload 7# GHC.Types.I#_static_info info table layout : 0_1 type : CONSTR bitmap : info table entry code inc %esi jmp *0x0(%ebp) entry code References : [C11], [S3], [C9], [C8], [2], [S20] Closure example (code) [Example.hs] module Example where value1 :: Int value1 = 7 [ghc -O -ddump-stg Example.hs] STG Example.value1 :: GHC.Types.Int [GblId, Caf=NoCafRefs, Str=DmdType m, Unf=OtherCon []] = NO_CCS GHC.Types.I#! [8]; Cmm [ghc -O -ddump-opt-cmm Example.hs] section "data" { __stginit_main:Example: } section "data" { Example.value1_closure: const GHC.Types.I#_static_info; const 7; } section "relreadonly" { SMc_srt: } [ghc -O -ddump-asm Example.hs] .data asm .align 4 .align 1 .globl __stginit_main:Example __stginit_main:Example: .data .align 4 .align 1 .globl Example.value1_closure Example.value1_closure: .long GHC.Types.I#_static_info .long 7 .section .data header .align 4 I# .align 1 SMd_srt: payload 7# References : [C11], [S3], [C9], [C8], [2], [S20] Closure examples : Maybe Just 7 :: Maybe Int header payload header Just info ptr I# Data.Maybe.Just_static_info layout : type : CONSTR bitmap : add jmp info table $0x2,%esi entry code *0x0(%ebp) info ptr payload 7# GHC.Types.I#_static_info layout : type : CONSTR bitmap : info table inc %esi entry code jmp *0x0(%ebp) References : [C11], [S3], [C9], [C8], [2], [S20] Closure examples : List [ 1, 2 ] :: [Int] header Cons info ptr GHC.Types.[]_closure payload I# Cons 1# GHC.Types.:_static_info layout : type : CONSTR bitmap : add jmp $0x2,%esi *0x0(%ebp) Nil I# 2# GHC.Types.[]_static_info layout : type : CONSTR bitmap : inc %esi jmp *0x0(%ebp) References : [C11], [S3], [C9], [C8], [2], [S20] Closure examples : Thunk "thunk" x + 1 :: Int (free variable : x = 7) header x+1 header payload reserved info ptr x I# payload 7# info ptr info table type : THUNK info table type : CONSTR entry code for λx -> x + 1 entry code for I# References : [C11], [S3], [C9], [C8], [2], [S20] STG-machine evaluation STG evaluation flow stack Sp continuation stack current expression R1 (1) push a continuation code (next code) to the stack top (2) enter to R1 closure jump a value Sp continuation R1 (3) set a result to R1 (4) jump (return) to the stack top code (5) repeat from (1) References : [C8], [3] Enter to a closure unevaluated closure R1 (1) read R1 to get a closure address (2) read header(info ptr) to get a Entry code address (3) jump to the Entry code address (4) execute the Entry code (5) set a result to R1 header payload info ptr layout closure type ... if ((Sp + -12) < SpLim) goto c3h9; else goto c3ha; : Info table Entry code (6) jump to the stack top address (continuation) References : [C11], [C9], [C8], [10], [3], [2], [12] Pointer tagging Pointer tagging header (info ptr) pointer payload R1 or ... pointer 00 01 10 11 ... an unevaluated closure ... an evaluated closure; 1st constructor value or evaluated. (for instance: "Nothing" ) ... an evaluated closure; 2nd constructor value. (for instance: "Just xx") ... an evaluated closure; 3rd constructor value. * 32bit machine case fast judgment! check only pointer's lower bits without evaluating the closure. References : [4], [2], [C16] Thunk and update Thunk and update "thunk" x + 1 :: Int (free variable : x = 7) payload header Thunk info ptr x type : THUNK (empty) I# #7 evaluate ( x + 1 ) x+1 Thunk info ptr (empty) x type : THUNK x+1 I# #7 I# #8 update Ind (indirect) info ptr type : THUNK x+1 lock free I# #7 I# #8 type : IND GC (eliminate Indirect) References : [3], [2], [C8] Allocate and free heap objects Allocate heap objects heap memory (nursery) HpLim Hp allocate (without malloc) HpLim Hp can't allocate because full HpLim Hp if (Hp > HpLim ) goto ... call stg_gc_... References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25] free and collect heap objects HpLim copying collection (minor GC) from space if (Hp > HpLim ) goto ... call stg_gc_... to space References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25] STG - C land interface STG (Haskell) land - C land interface User code R1 STG land (Haskell land) C land RtsAPI StgRun StgReturn result function f BaseReg Runtime System (Scheduler) References : [S18], [S17], [S19], [S21] Thread Thread layer (single core) Haskell Threads Haskell Threads HEC (Capability, Virtual processor) exclusive execution ... HEC user space OS Thread OS Process OS software (Linux, FreeBSD, Win, ...) hardware Physical Processor supervisor space (x86, ARM, ...) References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] Thread layer (multi core) Haskell Threads Haskell Threads Haskell Threads HEC (Capability, Virtual processor) ... ... HEC HEC user space OS Thread OS Thread OS Process software hardware supervisor space OS (Linux, FreeBSD, Win, ...) Physical Processor Physical Processor (x86, ARM, ...) (x86, ARM, ...) *Threaded option case (ghc -threaded) References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] Thread context switch Threads and context switch logical view Thread states Thread #0 Thread #1 Registers Registers Registers Stack Stack Stack load state STG-machine Registers Stack Thread #2 ... interleaved exclusive execution save state execution and pre-empted via the context switch heap References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] Threads and TSOs logical view Thread states Thread #0 Thread #1 Registers Registers Stack load state STG-machine Stack save state physical view Thread #2 heap memory Registers Stack ... TSO #0 (Thread State Object) TSO #1 (Thread State Object) Registers Stack TSO #2 (Thread State Object) heap References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] Scheduling by run queue STG-machine Registers Stack STG land (Haskell land) StgRun C land StgReturn Scheduler run queue popRunQueue appendToRunQueue round robin ... heap TSO TSO ... References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14] Context switch flow Haskell thread heap heap HpLim -> 0 (2) HpLim -> 0 RTS Interrupt (8) context switch (3) heap size check if (Hp > HpLim) TSO (4) call GC TSO GC Scheduler (5) HpLim check HpLim = 0 ? (7) scheduling (6) goto schedule (1) interrupt platform OS timer context switch at safe points References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24] Context switch flow (code) stg_gc_noregs if (HpLim == 0) { jump stg_returnToSched [R1]; STG land (Haskell land) C land cap->r.rHpLim = NULL; stopCapability stg_returnToSched W_ r1; r1 = R1; // foreign calls may clobber R1 SAVE_THREAD_STATE(); foreign "C" threadPaused(MyCapability() "ptr", CurrentTSO); R1 = r1; jump StgReturn [R1]; schedule contextSwitchCapability contextSwitchAllCapabilities handle_tick next handle_tick .. CreateTimerQueue initTicker initTimer startTimer hs_init_ghc real_main OS *Windows case References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24] Creating main and sub threads Create a main thread Runtime System Runtime system bootstrap code [rts/RtsAPI.c] rts_evalLazyIO createIOThread createThread ... (1), (2), (3) pushClosure ... (4) scheduleWaitThread appendToRunQueue ... (5) scheduler run queue (5) (1) TSO heap memory (2) stack (3) *stackobj closure (4) static memory ZCMain_main_closure header payload info table entry code References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24] Create a sub thread using forkIO Haskell Threads forkIO stg_forkzh ccall createIOThread ... (1), (2), (3), (4) ccall scheduleThread ... (5) User code Runtime System append (5) scheduler run queue (1) TSO heap memory forked closure (2) *stackobj (3) stack (4) header payload closure info table static memory entry code References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24] Thread migration Threads are migrated to idle HECs Haskell Threads Idle HEC run queue Idle HEC run queue run queue Idle HEC run queue empty empty empty HEC HEC HEC HEC OS Thread OS Thread OS Thread OS Thread Physical Processor Physical Processor Work pushing OS Process hardware Physical Processor Physical Processor References : [5], [8], [9], [14], [C17], [C18], [S17], [S16], [S23], [S24] Heap and Threads Threads shared a heap Haskell Threads Registers Registers Stack Stack ... Registers Registers Stack Stack ... heap memory (shared) static memory (shared) hardware HEC HEC Physical Processor Physical Processor References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25] Local allocation area (nursery) Haskell Threads heap memory (shared) Registers Registers Stack Stack Registers Stack Stack nursery ... nursery generation N static memory (shared) hardware ... Registers static memory HEC HEC Physical Processor Physical Processor fast access using nursery for each processors References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25] Threads and GC GC, nursery, generation, aging, promotion Haskell Thread STG land (Haskell land) Runtime System allocate generation 0, step 0 Hp nursery HEC nursery aging HEC nursery HEC nursery HEC generation 0, step 1 promotion generation 1 heap memory References : [8], [9], [15], [C13], [C11], [S25] Threads and minor GC sequential GC for young generation (minor GC) “stop-the-world” GC Haskell Threads GC thread GC thread GC thread GC thread nursery nursery nursery nursery generation 0, step 1 heap generation 1 HEC HEC HEC HEC Physical Processor Physical Processor Physical Processor Physical Processor References : [8], [9], [15], [C13], [C11], [S25] Threads and major GC parallel GC for oldest generation (major GC) “stop-the-world” GC Haskell Threads GC thread GC thread GC thread GC thread nursery nursery nursery nursery generation 0, step 1 heap generation 1 HEC HEC HEC HEC Physical Processor Physical Processor Physical Processor Physical Processor References : [8], [9], [15], [C13], [C11], [S25] GC discover live objects from the root Runtime System scheduler run queue root heap memory stack closure TSO TSO closure stack TSO closure reachable from root GC aging or promote unreachable (garbage) free References : [8], [9], [15], [C13], [C11], [S25] Bound thread A bound thread has a fixed associated OS Thread forkOS Bound Thread Haskell Threads HEC (Capability, Virtual processor) fixed association for safe foreign calls HEC OS Thread OS Thread OS Process Foreign calls from a bound thread are all made by the same OS thread. A bound thread is created using forkOS. The main thread is bound thread. References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22] forkIO, forkOn, forkOS forkIO forkOS forkOn Bound Threads Haskell Threads affinity bound HEC HEC HEC OS Thread OS Thread OS Thread create a haskell unbound thread create a haskell unbound thread on the specified HEC OS Thread create a haskell bound thread and an OS thread References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22] Spark Spark layer serial execution on each Spark Threads Sparks Haskell Thread HEC (Capability, Virtual processor) ... Spark Thread HEC user space OS Thread OS Process OS software (Linux, FreeBSD, Win, ...) hardware Physical Processor supervisor space (x86, ARM, ...) Spark Threads are generated on idle HECs. References : [C17], [19], [S17], [S26], [S27], [S33], [S12] Sparks and Spark pool logical view Spark Spark Spark Spark Spark Thread Spark Thread Spark Thread Spark Thread HEC HEC HEC HEC Physical Processor Physical Processor Physical Processor Physical Processor Spark pool rpar Spark (Thunk) References : [C17], [19], [S17], [S26], [S27], [S33], [S12] Spark pool and work stealing physical view Spark Spark Spark Thread Spark Spark Thread Spark Spark Thread Spark Thread Spark pool Spark pool Work stealing HEC Spark pool Spark pool References : [C17], [19], [S17], [S26], [S27], [S33], [S12] Sparks and closures Spark Thread runSparks STG land (Haskell land) getSpark C land Spark pool (WSDeque) HEC rpar push ... heap closure (thunk) closure (thunk) ... (not TSO objects, but closures. therefore very lightweight) References : [C17], [19], [S17], [S26], [S27], [S33], [S12] MVar MVar Haskell Thread #0 Haskell Thread #1 putMVar takeMVar empty? or full? MVar References : [16], [18], [19], [S31], [S12] MVar and blocking Haskell Thread Haskell Thread putMVar BLOCKED if full takeMVar full empty MVar MVar BLOCKED if empty References : [16], [18], [19], [S31], [S12] MVar example Haskell Thread #0 Haskell Thread #1 (3) takeMVar (2) (1) putMVar MVar timer context switch time Thread #0 Running Blocked Non Blocked takeMVar (1) MVar Running (3) wakeup and takeMVar (atomic) empty full putMVar empty (2) Running Thread #1 * single core case References : [16], [18], [19], [S31], [S12] MVar object view User view logical MVar object heap MVar empty? or full? physical MVar object StgReturn FIFO of StgMVarTSOQueue head tail head tail value closure value StgMVarTSOQueue StgMVarTSOQueue TSO TSO References : [16], [18], [19], [S31], [S12] newEmptyMVar Haskell Threads newEmptyMVar newMVar# (1) call the Runtime primitive Runtime System stg_newMVarzh ALLOC_PRIM_ SET_HDR StgMVar_head StgMVar_tail StgMVar_value (2) create a MVar object in the heap MVar object heap stg_END_TSO_QUEUE_closure head tail value (3) link each fields References : [16], [18], [19], [S31], [S12] takeMVar (empty case) Haskell Threads takeMVar takeMVar# Runtime System stg_takeMVarzh create StgMVarTSOQueue … (1) append … (2) StgReturn … (3) (3) return to the scheduler (1) create StgMVarTSOQueue FIFO of StgMVarTSOQueue head (2) append tail value MVar object References : [16], [18], [19], [S31], [S12] takeMVar (full case) Haskell Threads takeMVar takeMVar# Runtime System stg_takeMVarzh (1) get value (2) set empty (3) remove head (4) tryWakeupThread (3) remove head scheduler run queue fairness round robin head append tail value (1) get value (2) set empty (4) wakeup Only one of the blocked threads becomes unblocked. MVar object References : [16], [18], [19], [S31], [S12] Software transactional memory Create a atomic block using atomically atomically :: STM a -> IO a atomically STM a writeTVar writeTVar readTVar readTVar TVar readTVar TVar transactional variable Create and evaluate a composable “atomic block” Atomic block = All or Nothing References : [17], [19], [20], [C18], [S12], [S28] Rollback and blocking control using retry retry :: STM a STM a writeTVar retry :: STM a readTVar TVar TVar Discard, blocking and try again References : [17], [19], [20], [C18], [S12], [S28] Compose OR case using orElse orElse :: STM a -> STM a -> STM a STM a STM a A A orElse :: STM a result STM a -> STM a -> STM a B if no retry A or B if retry A A or B or Nothing References : [17], [19], [20], [C18], [S12], [S28] STM, TVar example (normal case) STM a writeTVar TVar TVar atomic block (critical section) time Thread #0 writeTVar writeTVar writeTVar commit TVar #A old value new value TVar #B old value new value atomic update References : [17], [19], [20], [C18], [S12], [S28] STM, TVar example (conflict case) atomic block rollback and try again atomic block time Thread #0 commit old value TVar #A TVar #B commit old value new value other value new value commit other thread different References : [17], [19], [20], [C18], [S12], [S28] retry example atomic block retry atomic block time blocked Thread #0 wake up TVar #A TVar #B commit old value new value old value changed value new value commit other thread References : [17], [19], [20], [C18], [S12], [S28] STM, TVar data structure retry atomically newTVar StgTVar StgTVar writeTVar readTVar StgTVarWatchQueue CurrentTSO StgTVarWatchQueue … TRecEntry TRecEntry StgTRecHeader StgTRecChunk atomically StgInvariantCheckQueue References : [17], [19], [20], [C18], [S12], [S28] newTVar, writeTVar, readTVar newTVar writeTVar readTVar or Runtime System TRecEntry (transaction Record) TRecEntry TRecEntry invariant check when commit TVar StgTVar heap References : [17], [19], [20], [C18], [S12], [S28] block by retry, wake up by commit retry :: STM a Runtime System commit by other thread (4) commit (1) retry TSO StgTVar (3)yield scheduler run queue WatchQueue (2) append TVar … (5)append … (4) commit wakeup all threads in the WatchQueue no guarantee of fairness, because the RTS has to run all the blocked transaction. References : [17], [19], [20], [C18], [S12], [S28] FFI FFI (Foreign Function Interface) Haskell thread ccall STG land (Haskell land) foreign out-call StgReturn Stg interface C land FFI foreign in-call StgRun Scheduler FFI management user space Runtime system Foreign C code OS API (system call) OS supervisor space References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] FFI and OS Threads Haskell Threads (4) Haskell Threads non-blocked (2) (1) HEC HEC (5) Foreign C code OS Thread Physical Processor (1) a safe foreign call (FFI) HEC Foreign C code (3) OS Thread OS Thread Physical Processor (2) move the HEC to other OS thread (3) spawn or draw an OS thread (4) move Haskell threads (5) call the foreign C code References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] A safe foreign call (code) Haskell Threads ccall suspendThread ccall FOREIGN_C_CODE ccall resumeThread releaseCapability_ giveCapabilityToTask startWorkerTask createOSThread … (3) non-blocked Haskell Threads (3) waitForReturnCapability … (1) … (2) … (4) (1) HEC HEC (4) Foreign C code (2) OS Thread OS Thread Physical Processor References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] a safe and an unsafe foreign call a safe foreign call an unsafe foreign call non-blocked Haskell Threads blocking safe foreign call blocked Haskell Threads HEC HEC Foreign C code blocking unsafe foreign call HEC Foreign C code OS Thread OS Thread Physical Processor OS Thread Physical Processor faster, but blocking to the other Haskell threads References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] Safe/unsafe foreign call and bound/unbound thread a safe foreign call non-blocked unbound thread an unsafe foreign call non-blocked non-blocked ccall an unbound thread HEC #0 Processor #0 non-blocked blocked ccall HEC #1 HEC #1 HEC #0 Foreign C code OS Thread unbound thread OS Thread Foreign C code OS Thread Processor #1 non-blocked HEC #1 bound thread OS Thread OS Thread Processor #0 Processor #1 non-blocked blocked ccall a bound thread HEC #0 HEC #1 HEC #1 ccall HEC #0 HEC #1 Foreign C code OS Thread Processor #0 OS Thread OS Thread Processor #1 bound thread HEC #1 Foreign C code OS Thread Processor #0 OS Thread OS Thread Processor #1 References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] IO and FFI IO Haskell Thread getLine (IO) IO String ? References : [6], [11] IO example: getLine getLine IO String standard IO lib STG land (Haskell land) c_read FFI (Foreign Function Interface) C land user space C IO lib (libio.a) “read” system call OS API (system call) supervisor space OS Device driver Hardware UART, serial, ... References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] IO example: getLine (code) User code getLine hGetLine hGetLineBuffered hGetLineBufferedLoop maybeFillReadBuffer getSomeCharacters readTextDevice Library Buffered.fillReadBuffer readBuf’ readBuf RawIO.read STG land (Haskell land) fdRead readRawBufferPtr c_read … switch safe/unsafe, non-threaded/ioManager FFI (Foreign Function Interface) C land read Runtime System OS API (system call) OS References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40] IO manager IO manager (single core) Haskell Threads IO manager dispatcher thread HEC HEC OS Thread OS Thread affinity user space OS Process software hardware OS (Linux, FreeBSD, Win, ...) supervisor space Physical Processor (x86, ARM, ...) *Threaded option case (ghc -threaded) References : [7], [5], [8] IO manager (multi core) Haskell Threads Haskell Threads IO manager dispatcher thread IO manager dispatcher thread HEC HEC HEC HEC OS Thread OS Thread OS Thread OS Thread user space OS Process OS software hardware supervisor space (Linux, FreeBSD, Win, ...) Physical Processor Physical Processor (x86, ARM, ...) (x86, ARM, ...) *Threaded option case (ghc -threaded) References : [7], [5], [8] IO manager Haskell Threads IO manager event table blocking IO registerFD event loop request and set callback (MVar) IO access (epoll, kqueue) takeMVar (wait and wake up) putMVar HEC HEC OS Thread OS Thread OS Process *Threaded option case (ghc -threaded) system call References : [7], [5], [8], [S29], [S30], [S32], [S37], [S35], [S3] Bootstrap Bootstrap sequence ZCMain_main_closure User code stg_ap_v_info stg_enter_info loadThreadState emitLoadThreadState STG land (Haskell land) stg_returnToStackTop StgRun schedule C land scheduleWaitThread main createIOThread createThread, pushClosure rts_evalLazyIO hs_init_ghc initScheduler, initStorage, initTimer, ioManagerStart, ... real_main hs_main Runtime System (lib/libHSrts.a) mainCRTStartup (*Windows case) OS References : [S7], [S13], [S14], [S17], [S18], [S19], [S9], [S10], [S21], [S41] Exit sequence User code STG land (Haskell land) stg_stop_thread_info StgReturn C land Runtime System (lib/libHSrts.a) schedule shutdownHaskellAndExit stg_exit exit OS References : [S19], [S18], [S17] Initializing STG land (Haskell land) startIOManagerThread startIOManagerThreads ensureIOManagerIsRunning create IO Manager Threads (+RTS -N) ioManagerStart C land initTimer, startTimer create Timer initGcThreads allocNurseries storageAddCapabilities initGeneration initStorage initializing GC Threads (+RTS -N) create nurseries (+RTS -N) createOSThread newTask startWorkerTask startWorkerTasks initScheduler create OS Threads (+RTS -N) create Tasks (+RTS -N) initCapabilities create capabilities (+RTS -N) create GC generations (+RTS -GN) Runtime System hs_init_ghc real_main hs_main main OS References : [1], [S7], [S13], [S14], [S17], [S15], [S16], [S24], [S21], [S34] Create each layers User code Sparks User code Runtime system Haskell threads OS Threads software hardware newSpark forkIO forkOn forkOS stg_forkzh, createThread stg_forkOnzh, createThread forkOS_createThread OS Process OS API initCapability (Capability, Virtual processor) (Worker thread) abstract OS Thread OS rpar HECs Tasks Runtime system forkOS newTask forkOS createOSThread pthread_create fork OS (Linux, FreeBSD, Win, ...) Physical Processor (x86, ARM, ...) References : [1], [5], [8], [9], [C11], [C17], [S12], [S26], [S22], [S15], [S23] References References [1] The Glorious Glasgow Haskell Compilation System User's Guide https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/index.html [2] Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine Version 2.5 http://research.microsoft.com/en-us/um/people/simonpj/Papers/spineless-tagless-gmachine.ps.gz [3] Making a Fast Curry Push/Enter vs Eval/Apply for Higher-order Languages http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/ [4] Faster Laziness Using Dynamic Pointer Tagging http://research.microsoft.com/en-us/um/people/simonpj/papers/ptr-tag/ptr-tagging.pdf [5] Runtime Support for Multicore Haskell http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf [6] Extending the Haskell Foreign Function Interface with Concurrency http://community.haskell.org/~simonmar/papers/conc-ffi.pdf [7] Mio: A High-Performance Multicore IO Manager for GHC http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf [8] The GHC Runtime System web.mit.edu/~ezyang/Public/jfp-ghc-rts.pdf [9] The GHC Runtime System http://www.scs.stanford.edu/14sp-cs240h/slides/ghc-rts.pdf [10] Evaluation on the Haskell Heap http://blog.ezyang.com/2011/04/evaluation-on-the-haskell-heap/ References [11] IO evaluates the Haskell Heap http://blog.ezyang.com/2011/04/io-evaluates-the-haskell-heap/ [12] Understanding the Stack http://www.well-typed.com/blog/94/ [13] Understanding the RealWorld http://www.well-typed.com/blog/95/ [14] The GHC scheduler http://blog.ezyang.com/2013/01/the-ghc-scheduler/ [15] GHC’s Garbage Collector http://www.mm-net.org.uk/workshop190404/GHC's_Garbage_Collector.ppt [16] Concurrent Haskell http://www.haskell.org/ghc/docs/papers/concurrent-haskell.ps.gz [17] Beautiful Concurrency https://www.fpcomplete.com/school/advanced-haskell/beautiful-concurrency [18] Anatomy of an MVar operation http://blog.ezyang.com/2013/05/anatomy-of-an-mvar-operation/ [19] Parallel and Concurrent Programming in Haskell http://community.haskell.org/~simonmar/pcph/ [20] Real World Haskell http://book.realworldhaskell.org/ References The GHC Commentary [C1] [C2] [C3] [C4] [C5] [C6] [C7] [C8] [C9] [C10] [C11] [C12] [C13] [C14] [C15] [C16] [C17] [C18] [C19] https://ghc.haskell.org/trac/ghc/wiki/Commentary https://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HscMain https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CoreSynType https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StgSynType https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/SymbolNames https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Stack https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/Registers https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/STM https://ghc.haskell.org/trac/ghc/wiki/Commentary/Libraries References Source code [S1] includes/stg/Regs.h [S2] includes/stg/MachRegs.h [S3] includes/rts/storage/ClosureTypes.h [S4] includes/rts/storage/Closures.h [S5] includes/rts/storage/TSO.h [S6] includes/rts/storage/InfoTables.h [S7] compiler/main/DriverPipeline.hs [S8] compiler/main/HscMain.hs [S9] compiler/cmm/CmmParse.y.source [S10] compiler/codeGen/StgCmmForeign.hs [S11] compiler/codeGen/Stg*.hs [S12] rts/PrimOps.cmm [S13] rts/RtsMain.c [S14] rts/RtsAPI.c [S15] rts/Capability.h [S16] rts/Capability.c [S17] rts/Schedule.c [S18] rts/StgCRun.c [S19] rts/StgStartup.cmm [S20] rts/StgMiscClosures.cmm [S21] rts/HeapStackCheck.cmm [S22] rts/Threads.c [S23] rts/Task.c [S24] rts/Timer.c [S25] rts/sm/GC.c [S26] rts/Sparks.c [S27] rts/WSDeque.c [S28] rts/STM.h [S29] rts/posix/Signals.c [S30] rts/win32/ThrIOManager.c [S31] libraries/base/GHC/MVar.hs [S32] libraries/base/GHC/Conc/IO.hs [S33] libraries/base/GHC/Conc/Sync.lhs [S34] libraries/base/GHC/Event/Manager.hs [S35] libraries/base/GHC/Event/Thread.hs [S36] libraries/base/GHC/IO/BufferedIO.hs [S37] libraries/base/GHC/IO/FD.hs [S38] libraries/base/GHC/IO/Handle/Text.hs [S39] libraries/base/System/IO.hs [S40] libraries/base/System/Posix/Internals.hs [S41] AutoApply.o (utils/genapply/GenApply.hs) Connect the algorithm and transistor
© Copyright 2024 ExpyDoc