Document

Lightweight Probabilistic
Broadcast
M2 Tatsuya Shirai
M1 Dai Saito
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
1
Broadcast in Large Scale
Environment
• End users send messages to all other users
more frequently.
– P2P BBS
– Stock markets
• These applications need software broadcast.
• Participating processes change more
dynamically compared to processes on
servers,
– machine crash
– login to or logout from applications
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
2
Deterministic Broadcast
• Each process transfers messages along defined routes.
• This approach provides consistency of message delivery
ordering.
– Messages from each process reach in the order that it
sends
• Reliability is expressed in “best effort”
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
3
Deterministic Broadcast cont.
• Poor scalability
– Single point of failure
– Cost of maintaining
routing information
• Low reliability at
unstable networks.
– Perturbation of few
processes makes
performance of healthy
processes lower.
rate of perturbed processes
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
4
Probabilistic Broadcast
• Each process transfers messages to randomly
selected processes without using defined routing
information.
• Approximate redundancy enhances reliability.
• Reliability is relatively high and stable in large
scale and unstable environments.
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
5
Pbcast [Kenneth et al. 1999]
• This approach concurrently uses deterministic
and probabilistic broadcast.
– While network load is low, deterministic broadcast
achieve high reliability and low cost.
– While network load is high, probabilistic broadcast
ensure certain reliability, especially of healthy
processes.
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
6
Deterministic Broadcast
• The first protocol is deterministic broadcast.
• It uses IP multicast, or if it is not available, uses
spanning trees randomly composed.
– But composing spanning trees needs information of
all membership. So this approach is limited to a few
hundred processes, as mentioned in this paper.
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
7
Anti-Entropy Protocol
• The second is anti-entropy protocol based on gossip.
– In each round, members choose some of other members
randomly, send a summary of their message history digest to
the selected processes.
– Processes receive the digest and check the lack of message,
and require the lacking message for original sender.
message 5,
message 8
message history
5, 8
5, 8
digests
lack 5, 8!
message history
membership
2007/1/15
info.
3, 9
message 3,
message 9
digests
message history
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
lack 3, 9!
3, 9
8
Anti-Entropy Protocol cont.
• Message size and fanout, the number of
processes to which a process send in one round,
define network load of this protocol.
• Message size is limited by message lifetime on
each process.
– A process send any message for some fixed rounds
from initial reception.
– After that, the message is gave up.
2007/1/15
3 7
1 5 8
6 2
4
5 9
1 5 8
6 2
4
1 5 8 6 2 4 3 7 9
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
9
Flow Control
• Flow control while the
network load is high.
– The rate of pbcast
messages should be limited.
message 5,
message 8
• Normally every 100ms.
– Retransmission should
delays in some rounds if
many other processes
require.
2007/1/15
5, 8
digests
3, 9
digests
message 3,
message 9
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
10
Evaluations
• Parameters:
– Message loss rate
– Fanout, the number of processes
• Reliability:
– (infected processes – failed ones) > all ones/2
• for applications based on quorum replication algorithm
• Throughput:
– The number of messages a process receives in 1
second.
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
11
Effects of Fanout
• Predicate I shows pbcast.
– Message loss rate is 0.05.
– Deterministic broadcast
reaches 10 % of the
processes.
– 50 processes participate.
• Probability of failure
decrease with an increase
of the number of fanout to
8.
2007/1/15
fanout (0~10)
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
12
Scalability
• Predicate I shows
pbcast.
– Message loss rate is
0.05.
– Deterministic broadcast
reaches 10 % of the
processes.
• Probability of failure
decrease with bigger
scale.
processes (0~60)
– Though broadcast to all
processes take more
rounds
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
13
Time for broadcast to all processes
• Messages are received in 12
rounds on an average, less
than 20 rounds at 1024
processes.
16 32 1024 processes
– Fanout is 1
– Det. broadcast is not used.
• This result shows the means
are at O(logN)
rounds (0~20)
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
14
Throughput
• 150 messages are sent in
one second.
– When message loss happens
frequently fanout is limited to
small size.
• Throughput of perturbed
processes decreases, but
healthy processes avail full
throughput.
pbcast
deterministic
rate of perturbed processes
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
15
Throughput cont.
• Throughput at 200
msg/sec.
– 25 % of the processes
pertube 25 % of the time.
– Det. broadcast is unused.
• High frequency of
packet loss causes
throughput lower.
• In this case, average
throughput decreases
to 60% at 96 processes
at high bandwidth.
32~96 processes
loss rate(0 ~ 0.2)
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
16
Conclusion of pbcast
• Gossip based protocol achieves scalability and
reliability in general network environments.
• Then, cost of processes are not considered. The
next topic is memory management for pbcast.
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
17
Membership Management
• Assumption
– Each process knows all Members
• memory consumption in large scale
• communication required to ensure
the consistency of the Membership
– Problems of Scalability in
Large scale environment
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
18
Membership Management of
lpbcast
• Member Management + Gossip
– Each process knows a subset of all Members
– Sending messages with Member information
– Size limitation of
Membership Management Buffer
• Fixed Memory consumption
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
19
Memory Management
• The Memory requirement for a process
should not change (in large scale)
– Buffer of Membership Management
– Buffer of outgoing message
→Scalability
• pbcast with a viewpoint of
“Memory Consumption”
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
20
lpbcast algorithm
• Assumptions
– Each process has unique ID
– Each message has unique ID (including
process ID)
– joining/leaving (= subscribing/unsubscribing)
• Buffers
–
–
–
–
–
2007/1/15
• Size limitation for all Buffers
Events : event notifications
– Especially in Events and Subs
EventIDs : Event IDs
Subs : subscription information
unSubs : unsubscription information
View : targets of gossip message
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
21
sending
• lpbcast(e)
• periodical gossip
– Add e to Events
– Send buffers to a subset
of View (every 50ms)
Mes
e
e
Events
EventIDs
Subs
unSubs
e
Events
2007/1/15
View
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
22
receiving
• When receiving gossip…
– Membership Management
• add Mes.unSubs : unSubs ・ remove Mes.unSubs : View,Subs
• add Mes.Subs : View,Subs
• If size of View is too large, move some items to Subs randomly
Mes.unSubs
Mes.Subs
Subs
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
View
23
receiving
• When receiving gossip…
– Event transmission
• Events received for the first time are transmitted to other
processes in View
• If size of Events is too large, remove randomly
– Retrieving Event
• When receiving undelivered event ID in Mes.EventIDs,
a request of retrieving Event
e
ID
Unknown
e
2007/1/15
e
Unknown
ID
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
Events
EventIDs
e
24
subscribing
• Subscribing process should know at least one
node in specific Members
• Sending Gossip with appending itself to Subs
• When timeout, making retransmission
View
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
25
unsubscribing
• Sending Gossip with appending itself to unSubs
– The process is gradually removed from individual view
– Set timeout to unSubs messages
– Assumption:removed process will not recover soon
unSubs
unSubs
2007/1/15
unSubs
unSubs
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
26
features of lpbcast
• Throughput is as high as pbcast
• A estimation of Memory consumption
• The membership algorithm and the dissemination
of events are dealt with at the same level.
• Each view is independent uniformly
– True P2P Model
→suitable for WAN
– Need to recognize the “locality”
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
27
Optimization
• Age-base
– Optimization of Events Buffer
– Now:Events Buffer is purged randomly
→better to remove well disseminated messages
– Age = # of hops
bcast(m1)
P1
deliver(m2)
[m1,m2]
[m1,m2]
[m1]
P2
bcast(m2) gossip(m2)
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
28
Optimization
• Frequency-base
– Optimization of Subs Buffer
– Now:Subs Buffer is purged randomly
→ better to remove well-known processes
– well-known = included in Subs Buffers
Subs(P1, P2)
P1
P2
Subs(P2)
P3
[P2]
2007/1/15
[P1,P2]
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
29
Experiment : # of rounds
• Simulation
– Prob. of Message loss:0.05 – Fanout = 3
– Prob. of process crash:0.01
• # of rounds to
disseminate 99%
of all processes
• Logarithmically
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
30
Experiment : Reliability
– SUN Ultra 10 (Solaris2.6, Memory256Mb)
– 100Mbps Ethernet
– 40msg/round, len(Events)=60
• A probability for
any given process of
delivering any given
event notification
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
31
Experiment : Optimization Effect
• Age-based optimization
Optimized
– Delivery ratio =
(# of delivered message)/(# of broadcast)
– 30msg/round
len(Events)=30
Fanout=4
60processes
Random
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
32
Conclusion
• Scalability+Reliability
• Bimodal Multicast
– Gossip based protocol achieves scalability
and reliability.
• Lightweight Probabilistic Broadcast
– Paying attention to cost of processes
– memory management for pbcast.
– Lightweight in large scale environment
2007/1/15
http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt
33