The Illustrated GHC [pdf]

GHC(STG,Cmm,asm) illustrated
for hardware persons
exploring some mental models and implementations
Takenobu T.
Rev. 0.01.1
WIP
“Any sufficiently advanced technology is
indistinguishable from magic.”
Arthur C. Clarke
NOTE
- This is not an official document by the ghc development team.
- Please don't forget “semantics”. It's very important.
- This is written for ghc 7.8 (and ghc 7.10).
Contents
- Executable binary
- Compile steps
- Runtime System
- Development languages
- Machine layer/models
- STG-machine
- Heap objects in STG-machine
- STG-machine evaluation
- Pointer tagging
- Thunk and update
- Allocate and free heap objects
- STG - C land interface
- Thread
- Thread context switch
- Creating main and sub threads
- Thread migration
- Heap and Threads
- Threads and GC
- Bound thread
- Spark
- Mvar
- Software transactional memory
- FFI
- IO and FFI
- IO manager
- Bootstrap
- References
Executable binary
The GHC = Compiler + Runtime System (RTS)
Haskell
source (.hs)
GHC (compile)
object (.o)
RuntimeSystem
(libHsRts.o)
libraries
(GHC.Base, ...)
GHC (link)
Executable
binary
including the RTS
(* static link case)
References : [1], [C1], [C3], [C10], [C19], [S7]
Compile steps
GHC transitions between five representations
Haskell language
GHC
compile
steps
Core language
STG language
Cmm language
Assembly language
(native or llvm)
each intermediate code can
be dumped by :
% ghc -ddump-parsed
% ghc -ddump-rn
% ghc -ddump-ds
% ghc -ddump-simpl
% ghc -ddump-prep
% ghc -ddump-stg
% ghc -ddump-cmm
% ghc -ddump-opt-cmm
% ghc -ddump-llvm
% ghc -ddump-asm
References : [1], [C3], [C4], [9], [C5], [C6], [C7], [C8[], [S7], [S8]
Runtime System
Generated binary includes the RTS
Haskell user code
executable
binary
Runtime System
software
hardware
Library
(* static link case)
OS
(Linux, FreeBSD, Win, ...)
Physical Processor
(x86, ARM, ...)
References : [C10], [9]
Runtime System includes ...
Runtime System
Storage Manager
Byte-code interpreter
Software
Transactional Memory
User space
Scheduler
Profiling
...
References : [C10], [8], [9], [5], [17], [S13]
Development languages
The GHC is developed by some languages
compiler
runtime system
( $(TOP)/compiler/*)
( $(TOP)/rts/*)
Haskell
+
Alex (lex)
Happy (yacc)
Cmm (C--)
Assembly
C
+
Cmm
Assembly
library
( $(TOP)/libraries/*)
Haskell
+
C
References : [C2]
Machine layer/models
Machine layer
STG-machine
(Abstract machine)
HEC - Haskell Execution Context
(Capability, Virtual processor)
Physical Processor
(x86, ARM, ...)
Each Haskell code is executed in STG semantics.
References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]
Machine layer
STG Registers
STG-machine
(Abstract machine)
Stack
Heap
Static region
Heap
Static region
BaseReg
R1
Hp
Sp
:
Register table
HEC - Haskell Execution Context
(Capability,
Virtual processor)
BaseReg
R1
Hp
:
Registers
Physical Processor
(x86, ARM, ...)
Memory
ebx
esi
edi
ebp
:
References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]
Runtime system and HEC
user code
Haskell user code
by STG-semantics
STG-machine
runtime system
HECs
user space
Runtime System
OS Threads
(native threads, kernel threads)
OS Process
OS
OS
(Linux, FreeBSD, Win, ...)
supervisor
space
(kernel space)
hardware
Physical Processor
(x86, ARM, ...)
References : [C14], [C6], [2], [C17], [8], [S15], [S16], [S11]
many HECs
Multi HECs can be generated by compile and runtime options :
$ ghc -rtsopts -threaded
$ ./xxx +RTS -N4
HEC
(Capability,
Virtual processor)
Task
(Worker Thread)
HEC
HEC
HEC
HEC
Tasks
(abstract OS Thread)
OS Threads
OS Process
software
hardware
OS
(Linux, FreeBSD, Win, ...)
Physical Processor
(x86, ARM, ...)
References : [1], [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
HEC (Capability) data structure
[rts/Capability.h] (ghc 7.8)
struct Capability_ {
StgFunTable f;
StgRegTable r;
register table
nat no;
Task *running_task;
rtsBool in_haskell;
nat idle;
run queue
rtsBool disabled;
StgTSO *run_queue_hd;
StgTSO *run_queue_tl;
InCall *suspended_ccalls;
bdescr **mut_lists;
bdescr **saved_mut_lists;
bdescr *pinned_object_block;
bdescr *pinned_object_blocks;
int context_switch;
int interrupt;
#if defined(THREADED_RTS)
Task *spare_workers;
nat n_spare_workers;
Mutex lock;
Task *returning_tasks_hd;
Task *returning_tasks_tl;
Message *inbox;
SparkPool *sparks;
SparkCounters spark_stats;
#endif
}
W_ total_allocated;
StgTVarWatchQueue *free_tvar_watch_queues;
StgInvariantCheckQueue *free_invariant_check_queues;
StgTRecChunk *free_trec_chunks;
StgTRecHeader *free_trec_headers;
nat transaction_tokens;
Each HEC (Capability) has a register table and a run queue and ...
Each HEC (Capability) is initialized at initCapabilities [rts/Capability.c]
References : [S15], [S16], [C11], [C17]
STG-machine
The STG-machine consists of three parts
STG Registers
BaseReg
R1
Hp
HpLim
Sp
SpLim
:
Stack
Sp
grows downwards
SpLim
Heap
HpLim
Hp
grows upwards
References : [2], [C15], [C11], [C12]
STG-machine is mapped to physical processor
logical view
STG Registers
physical view
physical register
(x86 example)
Register table
BaseReg
R1
Hp
Sp
ebx
esi
edi
ebp
ebx
esi
edi
:
:
...
References : [C15], [S1], [S2]
STG-machine is mapped to physical processor
logical view
physical view
Heap memory
Stack
TSO
Thread State Object
Heap
Stack
A stack and a TSO object are in the heap.
The stack is stored separately from the TSO for size extension and GC.
References : [C11], [C12], [S16], [S5]
TSO data structure
[includes/rts/storage/TSO.h] (ghc 7.8)
typedef struct StgTSO_ {
StgHeader
header;
struct StgTSO_*
_link;
struct StgTSO_*
global_link;
struct StgStack_
*stackobj;
link to stack object
StgWord16
what_next;
StgWord16
why_blocked;
StgWord32
flags;
StgTSOBlockInfo
block_info;
StgThreadID
id;
StgWord32
saved_errno;
StgWord32
dirty;
struct InCall_*
bound;
struct Capability_* cap;
struct StgTRecHeader_ * trec;
struct MessageThrowTo_ * blocked_exceptions;
struct StgBlockingQueue_ *bq;
StgWord32 tot_stack_size;
} *StgTSOPtr;
A TSO object is only ~17words + stack. Lightweight!
References : [S5]
Heap objects in STG-machine
Every heap object is represented uniformly
header
payload
info ptr
info table
meta data
entry code
actual machine code
Closure (header + payload) + Info Table + Entry Code
References : [C11], [S3], [S4], [S6], [2]
Heap object (closure)
logical view
header
payload
info ptr
physical view
heap memory
payload1
payload0
info ptr
info table
static memory
entry code
info table
entry code
References : [C11], [S3], [C9], [C8], [2]
Closure examples : Char, Int
'a' :: Char
header
C#
info
ptr
7 :: Int
payload
header
'a‘#
I#
GHC.Types.C#_static_info
layout : 0_1
type : CONSTR
bitmap :
inc %esi
jmp *0x0(%ebp)
info
ptr
payload
7#
GHC.Types.I#_static_info
info table
layout : 0_1
type : CONSTR
bitmap :
info table
entry code
inc %esi
jmp *0x0(%ebp)
entry code
References : [C11], [S3], [C9], [C8], [2], [S20]
Closure example (code)
[Example.hs]
module Example where
value1 :: Int
value1 = 7
[ghc -O -ddump-stg Example.hs]
STG
Example.value1 :: GHC.Types.Int
[GblId, Caf=NoCafRefs, Str=DmdType m, Unf=OtherCon []] =
NO_CCS GHC.Types.I#! [8];
Cmm
[ghc -O -ddump-opt-cmm Example.hs]
section "data" { __stginit_main:Example:
}
section "data" {
Example.value1_closure:
const GHC.Types.I#_static_info;
const 7;
}
section "relreadonly" { SMc_srt:
}
[ghc -O -ddump-asm
Example.hs]
.data
asm
.align 4
.align 1
.globl __stginit_main:Example
__stginit_main:Example:
.data
.align 4
.align 1
.globl Example.value1_closure
Example.value1_closure:
.long GHC.Types.I#_static_info
.long 7
.section .data
header
.align 4
I#
.align 1
SMd_srt:
payload
7#
References : [C11], [S3], [C9], [C8], [2], [S20]
Closure examples : Maybe
Just 7 :: Maybe Int
header
payload
header
Just
info
ptr
I#
Data.Maybe.Just_static_info
layout :
type : CONSTR
bitmap :
add
jmp
info table
$0x2,%esi entry code
*0x0(%ebp)
info
ptr
payload
7#
GHC.Types.I#_static_info
layout :
type : CONSTR
bitmap :
info table
inc %esi
entry code
jmp *0x0(%ebp)
References : [C11], [S3], [C9], [C8], [2], [S20]
Closure examples : List
[ 1, 2 ] :: [Int]
header
Cons
info
ptr
GHC.Types.[]_closure
payload
I#
Cons
1#
GHC.Types.:_static_info
layout :
type : CONSTR
bitmap :
add
jmp
$0x2,%esi
*0x0(%ebp)
Nil
I#
2#
GHC.Types.[]_static_info
layout :
type : CONSTR
bitmap :
inc %esi
jmp *0x0(%ebp)
References : [C11], [S3], [C9], [C8], [2], [S20]
Closure examples : Thunk
"thunk"
x + 1 :: Int
(free variable : x = 7)
header
x+1
header
payload
reserved
info
ptr
x
I#
payload
7#
info
ptr
info table
type : THUNK
info table
type : CONSTR
entry code for
λx -> x + 1
entry code for
I#
References : [C11], [S3], [C9], [C8], [2], [S20]
STG-machine evaluation
STG evaluation flow
stack
Sp
continuation
stack
current
expression
R1
(1)
push a continuation code
(next code) to the stack
top
(2)
enter to R1 closure
jump
a value
Sp
continuation
R1
(3)
set a result to R1
(4)
jump (return)
to the stack top code
(5)
repeat from (1)
References : [C8], [3]
Enter to a closure
unevaluated closure
R1
(1) read R1
to get a closure address
(2) read header(info ptr)
to get a Entry code address
(3) jump to the Entry code address
(4) execute the Entry code
(5) set a result to R1
header
payload
info
ptr
layout
closure type
...
if ((Sp + -12) < SpLim)
goto c3h9;
else goto c3ha;
:
Info table
Entry code
(6) jump to the stack top address
(continuation)
References : [C11], [C9], [C8], [10], [3], [2], [12]
Pointer tagging
Pointer tagging
header
(info ptr)
pointer
payload
R1 or ...
pointer
00
01
10
11
... an unevaluated closure
... an evaluated closure;
1st constructor value or evaluated.
(for instance: "Nothing" )
... an evaluated closure; 2nd constructor value.
(for instance: "Just xx")
... an evaluated closure; 3rd constructor value.
* 32bit machine case
fast judgment!
check only pointer's lower bits without evaluating the closure.
References : [4], [2], [C16]
Thunk and update
Thunk and update
"thunk"
x + 1 :: Int (free variable : x = 7)
payload
header
Thunk
info
ptr
x
type : THUNK
(empty)
I#
#7
evaluate ( x + 1 )
x+1
Thunk
info
ptr
(empty)
x
type : THUNK
x+1
I#
#7
I#
#8
update
Ind (indirect)
info
ptr
type : THUNK
x+1
lock free
I#
#7
I#
#8
type : IND
GC (eliminate Indirect)
References : [3], [2], [C8]
Allocate and free heap objects
Allocate heap objects
heap memory (nursery)
HpLim
Hp
allocate
(without malloc)
HpLim
Hp
can't allocate
because full
HpLim
Hp
if (Hp > HpLim ) goto ...
call stg_gc_...
References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25]
free and collect heap objects
HpLim
copying
collection
(minor GC)
from space
if (Hp > HpLim ) goto ...
call stg_gc_...
to space
References : [C11], [C13], [8], [9], [5], [15], [12], [13], [19], [S25]
STG - C land interface
STG (Haskell) land - C land interface
User code
R1
STG land
(Haskell land)
C land
RtsAPI
StgRun
StgReturn
result
function f
BaseReg
Runtime System
(Scheduler)
References : [S18], [S17], [S19], [S21]
Thread
Thread layer (single core)
Haskell Threads
Haskell
Threads
HEC
(Capability,
Virtual processor)
exclusive execution
...
HEC
user space
OS Thread
OS Process
OS
software
(Linux, FreeBSD, Win, ...)
hardware
Physical Processor
supervisor
space
(x86, ARM, ...)
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
Thread layer (multi core)
Haskell Threads
Haskell Threads
Haskell
Threads
HEC
(Capability,
Virtual processor)
...
...
HEC
HEC
user space
OS Thread
OS Thread
OS Process
software
hardware
supervisor
space
OS
(Linux, FreeBSD, Win, ...)
Physical Processor
Physical Processor
(x86, ARM, ...)
(x86, ARM, ...)
*Threaded option case (ghc -threaded)
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
Thread context switch
Threads and context switch
logical view
Thread
states
Thread #0
Thread #1
Registers
Registers
Registers
Stack
Stack
Stack
load
state
STG-machine
Registers
Stack
Thread #2
...
interleaved
exclusive execution
save
state
execution and
pre-empted via the context switch
heap
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
Threads and TSOs
logical view
Thread
states
Thread #0
Thread #1
Registers
Registers
Stack
load
state
STG-machine
Stack
save
state
physical view
Thread #2
heap memory
Registers
Stack
...
TSO #0
(Thread State
Object)
TSO #1
(Thread State
Object)
Registers
Stack
TSO #2
(Thread State
Object)
heap
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
Scheduling by run queue
STG-machine
Registers
Stack
STG land
(Haskell land)
StgRun
C land
StgReturn
Scheduler
run queue
popRunQueue
appendToRunQueue
round robin
...
heap
TSO
TSO
...
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14]
Context switch flow
Haskell thread
heap
heap
HpLim
-> 0
(2) HpLim -> 0
RTS
Interrupt
(8) context switch
(3) heap size check
if (Hp > HpLim)
TSO
(4) call GC
TSO
GC
Scheduler
(5) HpLim check
HpLim = 0 ?
(7) scheduling
(6) goto schedule
(1) interrupt
platform OS timer
context switch at safe points
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]
Context switch flow (code)
stg_gc_noregs
if (HpLim == 0) {
jump stg_returnToSched [R1];
STG land
(Haskell land)
C land
cap->r.rHpLim = NULL;
stopCapability
stg_returnToSched
W_ r1;
r1 = R1; // foreign calls may clobber R1
SAVE_THREAD_STATE();
foreign "C" threadPaused(MyCapability()
"ptr", CurrentTSO);
R1 = r1;
jump StgReturn [R1];
schedule
contextSwitchCapability
contextSwitchAllCapabilities
handle_tick
next
handle_tick ..
CreateTimerQueue
initTicker
initTimer
startTimer
hs_init_ghc
real_main
OS
*Windows case
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]
Creating main and sub threads
Create a main thread
Runtime
System
Runtime system bootstrap code [rts/RtsAPI.c]
rts_evalLazyIO
createIOThread
createThread ... (1), (2), (3)
pushClosure ... (4)
scheduleWaitThread
appendToRunQueue ... (5)
scheduler
run queue
(5)
(1)
TSO
heap memory
(2)
stack
(3)
*stackobj
closure
(4)
static memory
ZCMain_main_closure
header
payload
info table
entry code
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]
Create a sub thread using forkIO
Haskell Threads
forkIO
stg_forkzh
ccall createIOThread ... (1), (2), (3), (4)
ccall scheduleThread ... (5)
User code
Runtime System
append
(5)
scheduler
run queue
(1)
TSO
heap memory
forked closure
(2)
*stackobj
(3)
stack
(4)
header
payload
closure
info table
static memory
entry code
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S24]
Thread migration
Threads are migrated to idle HECs
Haskell
Threads
Idle HEC
run queue
Idle HEC
run queue
run queue
Idle HEC
run queue
empty
empty
empty
HEC
HEC
HEC
HEC
OS Thread
OS Thread
OS Thread
OS Thread
Physical Processor
Physical Processor
Work pushing
OS Process
hardware
Physical Processor
Physical Processor
References : [5], [8], [9], [14], [C17], [C18], [S17], [S16], [S23], [S24]
Heap and Threads
Threads shared a heap
Haskell Threads
Registers
Registers
Stack
Stack
...
Registers
Registers
Stack
Stack
...
heap memory (shared)
static memory (shared)
hardware
HEC
HEC
Physical Processor
Physical Processor
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25]
Local allocation area (nursery)
Haskell Threads
heap memory
(shared)
Registers
Registers
Stack
Stack
Registers
Stack
Stack
nursery
...
nursery
generation N
static memory
(shared)
hardware
...
Registers
static memory
HEC
HEC
Physical Processor
Physical Processor
fast access using nursery for each processors
References : [5], [8], [9], [14], [C17], [C11], [19], [S17], [S16], [S23], [S22], [S14], [S17], [S16], [S25]
Threads and GC
GC, nursery, generation, aging, promotion
Haskell
Thread
STG land
(Haskell land)
Runtime System
allocate
generation 0, step 0
Hp
nursery
HEC
nursery
aging
HEC
nursery
HEC
nursery
HEC
generation 0, step 1
promotion
generation 1
heap memory
References : [8], [9], [15], [C13], [C11], [S25]
Threads and minor GC
sequential GC for young generation (minor GC)
“stop-the-world” GC
Haskell
Threads
GC thread
GC thread
GC thread
GC thread
nursery
nursery
nursery
nursery
generation 0, step 1
heap
generation 1
HEC
HEC
HEC
HEC
Physical Processor
Physical Processor
Physical Processor
Physical Processor
References : [8], [9], [15], [C13], [C11], [S25]
Threads and major GC
parallel GC for oldest generation (major GC)
“stop-the-world” GC
Haskell
Threads
GC thread
GC thread
GC thread
GC thread
nursery
nursery
nursery
nursery
generation 0, step 1
heap
generation 1
HEC
HEC
HEC
HEC
Physical Processor
Physical Processor
Physical Processor
Physical Processor
References : [8], [9], [15], [C13], [C11], [S25]
GC discover live objects from the root
Runtime System
scheduler
run queue
root
heap memory
stack
closure
TSO
TSO
closure
stack
TSO
closure
reachable from root
GC aging or promote
unreachable
(garbage)
free
References : [8], [9], [15], [C13], [C11], [S25]
Bound thread
A bound thread has a fixed associated OS Thread
forkOS
Bound
Thread
Haskell
Threads
HEC
(Capability,
Virtual processor)
fixed association
for safe foreign calls
HEC
OS Thread
OS Thread
OS Process
Foreign calls from a bound thread are all made by the same OS thread.
A bound thread is created using forkOS.
The main thread is bound thread.
References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22]
forkIO, forkOn, forkOS
forkIO
forkOS
forkOn
Bound
Threads
Haskell
Threads
affinity
bound
HEC
HEC
HEC
OS Thread
OS Thread
OS Thread
create a haskell unbound
thread
create a haskell unbound
thread
on the specified HEC
OS Thread
create a haskell bound thread
and an OS thread
References : [6], [5], [8], [9], [14], [C17], [19], [S17], [S16], [S23], [S22]
Spark
Spark layer
serial execution on each Spark Threads
Sparks
Haskell
Thread
HEC
(Capability,
Virtual processor)
...
Spark Thread
HEC
user space
OS Thread
OS Process
OS
software
(Linux, FreeBSD, Win, ...)
hardware
Physical Processor
supervisor
space
(x86, ARM, ...)
Spark Threads are generated on idle HECs.
References : [C17], [19], [S17], [S26], [S27], [S33], [S12]
Sparks and Spark pool
logical view
Spark
Spark
Spark
Spark
Spark Thread
Spark Thread
Spark Thread
Spark Thread
HEC
HEC
HEC
HEC
Physical Processor
Physical Processor
Physical Processor
Physical Processor
Spark pool
rpar
Spark
(Thunk)
References : [C17], [19], [S17], [S26], [S27], [S33], [S12]
Spark pool and work stealing
physical view
Spark
Spark
Spark Thread
Spark
Spark Thread
Spark
Spark Thread
Spark Thread
Spark pool
Spark pool
Work stealing
HEC
Spark pool
Spark pool
References : [C17], [19], [S17], [S26], [S27], [S33], [S12]
Sparks and closures
Spark Thread
runSparks
STG land
(Haskell land)
getSpark
C land
Spark pool
(WSDeque)
HEC
rpar
push
...
heap
closure
(thunk)
closure
(thunk)
...
(not TSO objects, but closures. therefore very lightweight)
References : [C17], [19], [S17], [S26], [S27], [S33], [S12]
MVar
MVar
Haskell Thread #0
Haskell Thread #1
putMVar
takeMVar
empty?
or
full?
MVar
References : [16], [18], [19], [S31], [S12]
MVar and blocking
Haskell Thread
Haskell Thread
putMVar
BLOCKED
if full
takeMVar
full
empty
MVar
MVar
BLOCKED
if empty
References : [16], [18], [19], [S31], [S12]
MVar example
Haskell Thread #0
Haskell Thread #1
(3)
takeMVar
(2)
(1)
putMVar
MVar
timer context switch
time
Thread #0
Running
Blocked
Non Blocked
takeMVar
(1)
MVar
Running
(3) wakeup and takeMVar (atomic)
empty
full
putMVar
empty
(2)
Running
Thread #1
* single core case
References : [16], [18], [19], [S31], [S12]
MVar object view
User view
logical MVar object
heap
MVar
empty?
or
full?
physical MVar object
StgReturn
FIFO
of
StgMVarTSOQueue
head
tail
head
tail
value
closure
value
StgMVarTSOQueue
StgMVarTSOQueue
TSO
TSO
References : [16], [18], [19], [S31], [S12]
newEmptyMVar
Haskell Threads
newEmptyMVar
newMVar#
(1) call the Runtime primitive
Runtime System
stg_newMVarzh
ALLOC_PRIM_
SET_HDR
StgMVar_head
StgMVar_tail
StgMVar_value
(2) create a MVar object in the heap
MVar object
heap
stg_END_TSO_QUEUE_closure
head
tail
value
(3) link each fields
References : [16], [18], [19], [S31], [S12]
takeMVar (empty case)
Haskell Threads
takeMVar
takeMVar#
Runtime System
stg_takeMVarzh
create StgMVarTSOQueue … (1)
append
… (2)
StgReturn … (3)
(3) return to the scheduler
(1) create
StgMVarTSOQueue
FIFO of StgMVarTSOQueue
head
(2) append
tail
value
MVar object
References : [16], [18], [19], [S31], [S12]
takeMVar (full case)
Haskell Threads
takeMVar
takeMVar#
Runtime System
stg_takeMVarzh
(1) get value
(2) set empty
(3) remove head
(4) tryWakeupThread
(3) remove head
scheduler
run queue
fairness round robin
head
append
tail
value
(1) get value
(2) set empty
(4) wakeup
Only one of the blocked threads becomes unblocked.
MVar object
References : [16], [18], [19], [S31], [S12]
Software transactional memory
Create a atomic block using atomically
atomically :: STM a -> IO a
atomically
STM a
writeTVar
writeTVar
readTVar
readTVar
TVar
readTVar
TVar
transactional
variable
Create and evaluate a composable “atomic block”
Atomic block = All or Nothing
References : [17], [19], [20], [C18], [S12], [S28]
Rollback and blocking control using retry
retry :: STM a
STM a
writeTVar
retry :: STM a
readTVar
TVar
TVar
Discard, blocking and try again
References : [17], [19], [20], [C18], [S12], [S28]
Compose OR case using orElse
orElse :: STM a -> STM a -> STM a
STM a
STM a
A
A
orElse ::
STM a
result
STM a -> STM a -> STM a
B
if no retry A
or
B
if retry A
A or B or Nothing
References : [17], [19], [20], [C18], [S12], [S28]
STM, TVar example (normal case)
STM a
writeTVar
TVar
TVar
atomic block (critical section)
time
Thread #0
writeTVar
writeTVar writeTVar
commit
TVar #A
old value
new value
TVar #B
old value
new value
atomic update
References : [17], [19], [20], [C18], [S12], [S28]
STM, TVar example (conflict case)
atomic block
rollback and
try again
atomic block
time
Thread #0
commit
old value
TVar #A
TVar #B
commit
old value
new value
other value
new value
commit
other thread
different
References : [17], [19], [20], [C18], [S12], [S28]
retry example
atomic block
retry
atomic block
time
blocked
Thread #0
wake up
TVar #A
TVar #B
commit
old value
new value
old value
changed value
new value
commit
other thread
References : [17], [19], [20], [C18], [S12], [S28]
STM, TVar data structure
retry
atomically
newTVar
StgTVar
StgTVar
writeTVar
readTVar
StgTVarWatchQueue
CurrentTSO
StgTVarWatchQueue
…
TRecEntry
TRecEntry
StgTRecHeader
StgTRecChunk
atomically
StgInvariantCheckQueue
References : [17], [19], [20], [C18], [S12], [S28]
newTVar, writeTVar, readTVar
newTVar
writeTVar
readTVar
or
Runtime System
TRecEntry
(transaction
Record)
TRecEntry
TRecEntry
invariant check
when commit
TVar
StgTVar
heap
References : [17], [19], [20], [C18], [S12], [S28]
block by retry, wake up by commit
retry :: STM a
Runtime System
commit by other thread
(4) commit
(1) retry
TSO
StgTVar
(3)yield
scheduler
run queue
WatchQueue
(2) append
TVar
…
(5)append
…
(4) commit
wakeup all threads in the WatchQueue
no guarantee of fairness,
because the RTS has to run all the blocked transaction.
References : [17], [19], [20], [C18], [S12], [S28]
FFI
FFI (Foreign Function Interface)
Haskell thread
ccall
STG land
(Haskell land)
foreign
out-call
StgReturn
Stg interface
C land
FFI
foreign
in-call
StgRun
Scheduler
FFI management
user space
Runtime system
Foreign
C code
OS API (system call)
OS
supervisor
space
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
FFI and OS Threads
Haskell
Threads
(4)
Haskell
Threads
non-blocked
(2)
(1)
HEC
HEC
(5)
Foreign
C code
OS Thread
Physical Processor
(1) a safe foreign call (FFI)
HEC
Foreign
C code
(3)
OS Thread
OS Thread
Physical Processor
(2) move the HEC to other OS thread
(3) spawn or draw an OS thread
(4) move Haskell threads
(5) call the foreign C code
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
A safe foreign call (code)
Haskell Threads
ccall suspendThread
ccall FOREIGN_C_CODE
ccall resumeThread
releaseCapability_
giveCapabilityToTask
startWorkerTask
createOSThread
… (3)
non-blocked
Haskell
Threads
(3)
waitForReturnCapability
… (1)
… (2)
… (4)
(1)
HEC
HEC
(4)
Foreign
C code
(2)
OS Thread
OS Thread
Physical Processor
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
a safe and an unsafe foreign call
a safe foreign call
an unsafe foreign call
non-blocked
Haskell
Threads
blocking
safe
foreign call
blocked
Haskell
Threads
HEC
HEC
Foreign
C code
blocking
unsafe
foreign call
HEC
Foreign
C code
OS Thread
OS Thread
Physical Processor
OS Thread
Physical Processor
faster,
but blocking to the other Haskell threads
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
Safe/unsafe foreign call and bound/unbound thread
a safe foreign call
non-blocked
unbound
thread
an unsafe foreign call
non-blocked
non-blocked
ccall
an unbound
thread
HEC #0
Processor #0
non-blocked
blocked
ccall
HEC #1
HEC #1
HEC #0
Foreign
C code
OS Thread
unbound
thread
OS Thread
Foreign
C code
OS Thread
Processor #1
non-blocked
HEC #1
bound
thread
OS Thread
OS Thread
Processor #0
Processor #1
non-blocked
blocked
ccall
a bound
thread
HEC #0
HEC #1
HEC #1
ccall
HEC #0
HEC #1
Foreign
C code
OS Thread
Processor #0
OS Thread
OS Thread
Processor #1
bound
thread
HEC #1
Foreign
C code
OS Thread
Processor #0
OS Thread
OS Thread
Processor #1
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
IO and FFI
IO
Haskell Thread
getLine (IO)
IO String
?
References : [6], [11]
IO example: getLine
getLine
IO String
standard IO lib
STG land
(Haskell land)
c_read
FFI (Foreign Function Interface)
C land
user
space
C IO lib (libio.a)
“read” system call
OS API (system call)
supervisor
space
OS
Device driver
Hardware
UART, serial, ...
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
IO example: getLine (code)
User code
getLine
hGetLine
hGetLineBuffered
hGetLineBufferedLoop
maybeFillReadBuffer
getSomeCharacters
readTextDevice
Library
Buffered.fillReadBuffer
readBuf’
readBuf
RawIO.read
STG land
(Haskell land)
fdRead
readRawBufferPtr
c_read
… switch safe/unsafe,
non-threaded/ioManager
FFI (Foreign Function Interface)
C land
read
Runtime
System
OS API (system call)
OS
References : [6], [11], [20], [S39], [S38], [S37], [S36], [S40]
IO manager
IO manager (single core)
Haskell Threads
IO manager
dispatcher thread
HEC
HEC
OS Thread
OS Thread
affinity
user space
OS Process
software
hardware
OS
(Linux, FreeBSD, Win, ...)
supervisor
space
Physical Processor
(x86, ARM, ...)
*Threaded option case (ghc -threaded)
References : [7], [5], [8]
IO manager (multi core)
Haskell
Threads
Haskell
Threads
IO manager
dispatcher
thread
IO manager
dispatcher
thread
HEC
HEC
HEC
HEC
OS Thread
OS Thread
OS Thread
OS Thread
user space
OS Process
OS
software
hardware
supervisor
space
(Linux, FreeBSD, Win, ...)
Physical Processor
Physical Processor
(x86, ARM, ...)
(x86, ARM, ...)
*Threaded option case (ghc -threaded)
References : [7], [5], [8]
IO manager
Haskell Threads
IO manager
event table
blocking IO
registerFD
event
loop
request and
set callback (MVar)
IO access
(epoll, kqueue)
takeMVar
(wait and wake up)
putMVar
HEC
HEC
OS Thread
OS Thread
OS Process
*Threaded option case (ghc -threaded)
system call
References : [7], [5], [8], [S29], [S30], [S32], [S37], [S35], [S3]
Bootstrap
Bootstrap sequence
ZCMain_main_closure
User
code
stg_ap_v_info
stg_enter_info
loadThreadState
emitLoadThreadState
STG land
(Haskell land)
stg_returnToStackTop
StgRun
schedule
C land
scheduleWaitThread
main
createIOThread
createThread, pushClosure
rts_evalLazyIO
hs_init_ghc
initScheduler, initStorage, initTimer, ioManagerStart, ...
real_main
hs_main
Runtime
System
(lib/libHSrts.a)
mainCRTStartup
(*Windows case)
OS
References : [S7], [S13], [S14], [S17], [S18], [S19], [S9], [S10], [S21], [S41]
Exit sequence
User
code
STG land
(Haskell land)
stg_stop_thread_info
StgReturn
C land
Runtime
System
(lib/libHSrts.a)
schedule
shutdownHaskellAndExit
stg_exit
exit
OS
References : [S19], [S18], [S17]
Initializing
STG land
(Haskell land)
startIOManagerThread
startIOManagerThreads
ensureIOManagerIsRunning
create IO Manager Threads (+RTS -N)
ioManagerStart
C land
initTimer, startTimer
create Timer
initGcThreads
allocNurseries
storageAddCapabilities
initGeneration
initStorage
initializing GC Threads (+RTS -N)
create nurseries (+RTS -N)
createOSThread
newTask
startWorkerTask
startWorkerTasks
initScheduler
create OS Threads (+RTS -N)
create Tasks (+RTS -N)
initCapabilities
create capabilities (+RTS -N)
create GC generations (+RTS -GN)
Runtime
System
hs_init_ghc
real_main
hs_main
main
OS
References : [1], [S7], [S13], [S14], [S17], [S15], [S16], [S24], [S21], [S34]
Create each layers
User code
Sparks
User
code
Runtime
system
Haskell threads
OS Threads
software
hardware
newSpark
forkIO
forkOn
forkOS
stg_forkzh, createThread
stg_forkOnzh, createThread
forkOS_createThread
OS Process
OS API
initCapability
(Capability, Virtual processor)
(Worker thread) abstract OS Thread
OS
rpar
HECs
Tasks
Runtime system
forkOS
newTask
forkOS
createOSThread
pthread_create
fork
OS
(Linux, FreeBSD, Win, ...)
Physical Processor
(x86, ARM, ...)
References : [1], [5], [8], [9], [C11], [C17], [S12], [S26], [S22], [S15], [S23]
References
References
[1]
The Glorious Glasgow Haskell Compilation System User's Guide
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/index.html
[2]
Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine Version 2.5
http://research.microsoft.com/en-us/um/people/simonpj/Papers/spineless-tagless-gmachine.ps.gz
[3]
Making a Fast Curry Push/Enter vs Eval/Apply for Higher-order Languages
http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/
[4]
Faster Laziness Using Dynamic Pointer Tagging
http://research.microsoft.com/en-us/um/people/simonpj/papers/ptr-tag/ptr-tagging.pdf
[5]
Runtime Support for Multicore Haskell
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf
[6]
Extending the Haskell Foreign Function Interface with Concurrency
http://community.haskell.org/~simonmar/papers/conc-ffi.pdf
[7]
Mio: A High-Performance Multicore IO Manager for GHC
http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf
[8]
The GHC Runtime System
web.mit.edu/~ezyang/Public/jfp-ghc-rts.pdf
[9]
The GHC Runtime System
http://www.scs.stanford.edu/14sp-cs240h/slides/ghc-rts.pdf
[10]
Evaluation on the Haskell Heap
http://blog.ezyang.com/2011/04/evaluation-on-the-haskell-heap/
References
[11]
IO evaluates the Haskell Heap
http://blog.ezyang.com/2011/04/io-evaluates-the-haskell-heap/
[12]
Understanding the Stack
http://www.well-typed.com/blog/94/
[13]
Understanding the RealWorld
http://www.well-typed.com/blog/95/
[14]
The GHC scheduler
http://blog.ezyang.com/2013/01/the-ghc-scheduler/
[15]
GHC’s Garbage Collector
http://www.mm-net.org.uk/workshop190404/GHC's_Garbage_Collector.ppt
[16]
Concurrent Haskell
http://www.haskell.org/ghc/docs/papers/concurrent-haskell.ps.gz
[17]
Beautiful Concurrency
https://www.fpcomplete.com/school/advanced-haskell/beautiful-concurrency
[18]
Anatomy of an MVar operation
http://blog.ezyang.com/2013/05/anatomy-of-an-mvar-operation/
[19]
Parallel and Concurrent Programming in Haskell
http://community.haskell.org/~simonmar/pcph/
[20] Real World Haskell
http://book.realworldhaskell.org/
References
The GHC Commentary
[C1]
[C2]
[C3]
[C4]
[C5]
[C6]
[C7]
[C8]
[C9]
[C10]
[C11]
[C12]
[C13]
[C14]
[C15]
[C16]
[C17]
[C18]
[C19]
https://ghc.haskell.org/trac/ghc/wiki/Commentary
https://ghc.haskell.org/trac/ghc/wiki/Commentary/SourceTree
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/HscMain
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CoreSynType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StgSynType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/CmmType
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/SymbolNames
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Stack
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/Registers
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/STM
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Libraries
References
Source code
[S1] includes/stg/Regs.h
[S2] includes/stg/MachRegs.h
[S3] includes/rts/storage/ClosureTypes.h
[S4] includes/rts/storage/Closures.h
[S5] includes/rts/storage/TSO.h
[S6] includes/rts/storage/InfoTables.h
[S7] compiler/main/DriverPipeline.hs
[S8] compiler/main/HscMain.hs
[S9] compiler/cmm/CmmParse.y.source
[S10] compiler/codeGen/StgCmmForeign.hs
[S11] compiler/codeGen/Stg*.hs
[S12] rts/PrimOps.cmm
[S13] rts/RtsMain.c
[S14] rts/RtsAPI.c
[S15] rts/Capability.h
[S16] rts/Capability.c
[S17] rts/Schedule.c
[S18] rts/StgCRun.c
[S19] rts/StgStartup.cmm
[S20] rts/StgMiscClosures.cmm
[S21] rts/HeapStackCheck.cmm
[S22] rts/Threads.c
[S23] rts/Task.c
[S24] rts/Timer.c
[S25] rts/sm/GC.c
[S26] rts/Sparks.c
[S27] rts/WSDeque.c
[S28] rts/STM.h
[S29] rts/posix/Signals.c
[S30] rts/win32/ThrIOManager.c
[S31] libraries/base/GHC/MVar.hs
[S32] libraries/base/GHC/Conc/IO.hs
[S33] libraries/base/GHC/Conc/Sync.lhs
[S34] libraries/base/GHC/Event/Manager.hs
[S35] libraries/base/GHC/Event/Thread.hs
[S36] libraries/base/GHC/IO/BufferedIO.hs
[S37] libraries/base/GHC/IO/FD.hs
[S38] libraries/base/GHC/IO/Handle/Text.hs
[S39] libraries/base/System/IO.hs
[S40] libraries/base/System/Posix/Internals.hs
[S41] AutoApply.o (utils/genapply/GenApply.hs)
Connect the algorithm and transistor