Slides

1
Belle II Computing and
requirement of the netowk
LHCONE Asia-Pacific workshop
@ Nantou, Taiwan
Belle II computing
resource, design, network
Network data challenge
Trans-Pacific
Trans-Atlantic
LHCONE(-like layer) for Belle II ?
Takanori Hara (KEK)
[email protected]
13 Aug., 2014
Integrated Luminosity (ab-1)
Luminosity Prospect
80
70
2
SuperKEKB Commissioning starts in 2015
9 months/year
20 days/month
60
Belle
-1
~1 ab
50
40
target integrated luminosity
50 ab-1in 2022
34
2
target instantaneous luminosity
2.1 x 10 /cm /s
30
35
2
8 x 10 /cm /s
20
10
0
2010
2012
Experiment
Belle II
ALICE (Pb-Pb)
ALICE (p-p)
ATLAS
CMS
LHCb
2014
2016
Event size
2018
Physics run starts in 2017
2020
2022 calendar year
Rate @ Storage Rate @ Storage
[kB]
[event/sec]
[MB/sec]
300
6,000
1,800 (@ max. luminosity)
50,000
2,000
1,500
1,500
55
100
100
600
150
4,500
4,000
200
700
225 (<~1000)
250
(LHC experiments : as seen in 2011/2012 runs)
(/ab)
Hardware Resources for Belle II
14.00
Yearly integrated luminosity
12.00
10.00
6.00
100
4.00
50
2.00
1,400
1,200
1,000
800
600
400
200
0
200
150
8.00
0.00
250
Year1 Year2 Year3 Year4 Year5 Year6 Year7
CPU (kHEPSpec)
total integrated
Analysis
MC (reproduce)
MC
Data (reprocess)
Data
Challenge
challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7
0
160
140
120
100
80
60
40
20
0
3
Tape (PB) for raw data
total integrated
PNNL
KEK
Challenge
challenge
Year1 Year2 Year3 Year4 Year5 Year6 Year7
Disk space (PB)
total integrated
Analysis
MC
Data
Challenge
challenge
Year1 Year2 Year3 Year4 Year5 Year6 Year7
(/ab)
Hardware Resources for Belle II
14.00
250
Yearly integrated luminosity
200
12.00
10.00
150
8.00
6.00
100
4.00
50
2.00
0.00
1,400
1,200
1,000
800
600
400
200
0
Year1 Year2 Year3 Year4 Year5 Year6 Year7
0
CPU (kHEPSpec)
total integrated
Analysis
MC (reproduce)
MC
Data (reprocess)
Data
Challenge
challenge Year1 Year2 Year3 Year4 Year5 Year6 Year7
2014
ATLAS
=975kHS
CMS
=745kHS
ALICE
=373kHS
LHCb
=218kHS
160
140
120
100
80
60
40
20
0
Pledge summary of LHC experiments : http://wlcg-rebus.cern.ch/apps/pledges/summary/
4
Tape (PB) for raw data
total integrated
PNNL
Challenge
challenge
CMS
=77PB
ATLAS
=81PB
KEK
ALICE
=28PB
LHCb
=21PB
Year1 Year2 Year3 Year4 Year5 Year6 Year7
Disk space (PB)
total integrated
Analysis
MC
Data
Challenge
ATLAS
=100PB
CMS
=62PB
LHCb
=18PB
challenge
ALICE
=31PB
Year1 Year2 Year3 Year4 Year5 Year6 Year7
5
Belle II Collaboration
c.f.
ATLAS, 38 countries, 177 institutes, ~3000 members
CMS : 42 countries, 182 institutes, 4300 members
ALICE : 36 countries, 131 institutes, 1200 members
LHCb : 16 countries, 67 institues, 1060 members
23 countries/regions
97 institutes
577 colleagues
as of June 30, 2014
Asia : ~45%
Japan :137
Korea : 34
Taiwan : 22
India : 20
China : 15
Australia :18
N. America
: ~15%
US : 63
Canada : 17
Europe : ~40%
Germany : 83
Italy : 59
Russia : 37
Slovenie : 14
Austria : 14
Poland : 11
p.
a
c
re
Belle II Computing Model
Detector
Raw data storage
and processing
Raw data duplex.
processing
raw data storage
and (re)process
MC production and
Physics analysis skim
mdst storage
KEK Data Center
PNNL Data Center
Raw data
mdst Data
mdst MC
dashed inputs for
Ntuple
CPU
Raw Data Center
Asia
Disk
Europe 1
Europe 2
Tape
Regional Data Center
GRID site
MC production site
user analysis
(Ntuple level)
until Year 3
6
Local resource
Cloud site
Computer cluster site
Current status of computing
7
15 countries/regions
27 sites (+ 2 non-Belle II sites)
70 kHS (100 kHS @ max)
3rd
6200M
2nd
events
560M
HEPHY (Vienna) and MPPMU (Munich)
joined recently
GRID, Cloud, local cluster
is available
events
1st
60M
First official release of MC samples
events
BB generic decay/continuum
tau pair
-1
(corresponding to 100fb w/ and w/o BG)
Trans-pacific / trans-atlantic
network data tranfer challenge
300 kHS
120 sites
Belle II now
70 kHS
LHCb
d
ie
f
i
d
mo
Detector
Belle II Computing Model
PNNL Data Center
raw data storage
and (re)process
(100% )
MC production and
Physics analysis skim
mdst storage
after Year 4
(raw data part)
North America
KEK Data Center
(30% )
Canada Data Center
(10% )
Raw Data Center
Germany Data Center
India Data Center
(10% )
Korea Data Center
(10% )
Local resource
Italy Data Center
Europe
Regional Data Center
GRID site
(20% )
(20% )
Asia
MC production site
user analysis
(Ntuple level)
8
Cloud site
Computer cluster site
9
Raw Data Distribution
until Year 3
KEK Data Center
PNNL Data Center
(100% )
(100% )
North America
(copy from KEK)
Scenario 1
KEK Data Center
PNNL Data Center
(100% )
(30% )
Canada Data Center
(10% )
Germany Data Center
India Data Center
(10% )
Korea Data Center
(10% )
Scenario 2
(2step copy, KEK  PNNL  Europe)
Asia
(20% )
Italy Data Center
(20% )
Europe
North America
KEK Data Center
PNNL Data Center
(100% )
(70%  30%)
Canada Data Center
(10% )
India Data Center
(10% )
Korea Data Center
(10% )
Asia
Germany Data Center
(20% )
Italy Data Center
(20% )
Europe
mDST/MC Data Distribution
10
mDST (data) is copied in Asia, Europe, and USA
For the MC data seems to be natural to be
the similar structure
better network ? in each region
completeness of the dataset in each region
easier maintenance ?
unbalance of resources
data copy between three regions
1-set
mDST
MC
main center : GridKa/DESY (Germany),
CNAF (Italy)
SiGNET (Slovenia)
CYFRONET/CC1 (Poland)
BINP (Russia)
HEPHY (Austria)
CESNET (Czech rep.)
ISMA (Ukraine)
INFN Napoli/Pisa/Frascati
/Legnaro/Torino (Italy)
ULAKBIM (Turkey)
: spain, saudi arabia
1-set
mDST
MC
main center : KEK (Japan)
KISTI (Korea)
NTU (Taiwan)
Melbourne U.(Australia)
IHEP (China)
TIFR (India)
many Japanese Univ.
: thai, vietnam, malaysia, ...
1-set
mDST
MC
main center : PNNL
U.Vic. / McGill (Canada)
VPI, Hawaii, ...
many US univ.
: mexico
11
Scenario 1
[Gbit/s]
8
6
4
2
Total in-bound
KEK
PNNL
Germany
Italy
Korea
India
Canada
KEK
PNNL
Korea
to PNNL
Germany
Italy
Slovenia
Australia
Canada
Germany
Italy
to Europe
Austria
China
Czech Rep.
India
Malesia
Mexico
Poland
KEK
Russia
Saudi Arabia
Spain*
Taiwan
Thailand*
Turkey
[Gbit/s]
20
Year1
Year2
Total out-bound
16
Year3
Year4
Year5
Year6
Year7
KEK
from KEK
PNNL
Korea
Germany
Italy
Slovenia
12
Australia
Canada
Austria
China
8
Czech Rep.
India
Malesia
Mexico
from PNNL
4
Poland
Russia
Saudi Arabia
Spain*
Year1
Year2
Year3
Year4
Year5
Year6
Year7
12
Scenario 2
[Gbit/s]
14
10
6
Total in-bound
KEK
PNNL
Germany
Italy
Korea
India
Canada
KEK
PNNL
to PNNL
Korea
Germany
Italy
Slovenia
Australia
Canada
Austria
China
Czech Rep.
India
Germany
Italy
to Europe
KEK
2
Malesia
Mexico
Poland
Russia
Saudi Arabia
Spain*
Taiwan
Thailand*
Turkey
[Gbit/s]
20
Year1
Year2
Total out-bound
16
Year3
Year4
Year5
Year6
Year7
KEK
from KEK
PNNL
Korea
Germany
Italy
Slovenia
12
Australia
Canada
Austria
China
8
from PNNL
Czech Rep.
India
Malesia
Mexico
4
Poland
Russia
Saudi Arabia
Spain*
Year1
Year2
Year3
Year4
Year5
Year6
Year7
Network Connectivity
13
Current Connectivity
Trans-Pacific
10G : Tokyo - LA
10G : Tokyo - NY
10G : Osaka -Washington
Trans-Atlantic
3 x 10G : NY - Amsterdam
3 x 10G : Washington - Frankfurt
ANA-100G NY - Amsterdam
Trans-Asia
2.5G : Madrid-Mumbai
2.5G : Singapore-Mumbai
10G : Japan-Singapore
“Planned” Connectivity
Trans-Pacific
SINET5
100G link to US
in 2016
Trans-Atlantic
EEX (ESNet Extension to Europe)
2 x 100G : NY - London
100G : Washington - Geneva
40G : Boston - Amsterdam
Trans-Asia
10G : Mumbai - GEANT
SINET ?
Trans-Pacific data challenge
KEKCC
Intrusion
Detection
System
Nexus
5000
Firewall
for KEKCC
Setup (KEK-PNNL) in 2013
Tokyo DC
There are “firewalls” between KEK and PNNL
We need to know the reason
of the 500MB/s limitatoin
Firewall ?
sender/receiver harware
CPU, disk I/O ?
PNNL
computer
Catalyst
6504
40G
SINET
Tsukuba
DC
LAX
Firewall
ESnet
14
Trans-Pacific data challenge
KEK
10Gbps
Tsukuba
Tokyo
SINET4:10Gbps
Seattle
PNNL
Japan and USA
15
PacificWave
Los Angeles
PacificWave :
20Gbps
500MB/s
KEK(Japan)  PNNL(USA) : 500MB/s is achieved
= ~ required network bandwidth @ early 2018
100MB/s
PNNL(USA)  GridKa(Germany) : 100MB/s
Also testing the network from PNNL to Europe
But not enough for the network bandwidth @ middle of Year4 and later (~2GB/s)
We need a 40Gbps - 100Gbps network between Japan and USA
16
New setup (KEK-PNNL)
Japan
KEK
VLAN 954
SINET (VRF AS2907)
VLAN 954
202.13.223.134/30
VLAN 954
202.13.223.133/30
Tsukuba
Toyko
VLAN 954
VLAN 954
Trans-Pacific Link
KEK site test subnet 202.13.197.192/26
Belle-II Testing between PNNL and KEK
(Setup to stay in place thru 30 June 2016)
North America
L2 (Ethernet VLAN)
Connection from SINET to
CENIC to support L3 BGP
peering between SINET (for
KEK) and ESnet
LAX-dc-GM1.
s4.sinet.ad.kp
202.13.223.117/30
VLAN
4000
ESnet (AS293)
PNNL (AS65428)
xe-1/0/1
VRF
13/1
PNNL-CE2
(VRF)
V
R
F
192.188.41.1/30
V
VLAN 3010
10/1/1
R
F
pnwg-cr5.es.net
202.13.223.118/30
VLAN
4000
9/1/4
sunn-cr5.es.net
SUNN
LOSA
192.188.41.2/30
10Gbps Best-Effort
LSP between VRFs
PNNL site test subnet 198.129.43.0/24
V
R
F
Current link is 10GE with shared traffic, upgrade
to 100G is in progress (ETA Aug 15 2014)
CENIC (PacWave)
17
Trans-Atlantic data challenge
US side
10G
PNNL-CE
192.188.41.17/28
V
R
F
VLAN 3011
10G
V
R
F
pnwg-cr5.es.net
192.188.41.1/30
aofa-cr5.es.net
100G
VLAN 3011
V
R
F
Brocade MLX
100G
192.188.41.5/30
AN 11
A1
00
xxx.hep.pnnl.org
Test was done
in May/June 2014
Ethernet
VLAN
Bridging
SURFnet
(AS1103)
!(AS20965)
62.40.124.58/30
VLAN 3011
Juniper T4000
Chin Guok
mx1.ams.nl
V
R
F
VLAN 3011
193.206.128.1/30
193.206.130.1/30
rx1.na1.garr.net
V
R
F
10G
VRF
193.206.130.2/30
VRF
40G
193.206.128.2/30
na.infn.it
cnaf.infn.it
ppssrm-kit.gridka.de
(192.108.45.58)
DFN (AS680)
xr-fra1.x-win.dfn.de
DFN
CNAF
ds-202-11-03.cnaf.infn.it
(131.154.130.76)
3011
f01-151-45-e.gridka.de
(192.108.45.246)
rx1.mi1.garr.net
GARR (AS137)
rx1.bo.garr.net
192.108.68.66/30
fts3-node1-kit.gridka.de
(192.108.45.59)
100G
V
R
F
GARR
ANA-100
VRF
62.40.124.57/30
mx1.fra.de
f01-151-10-e.gridka.de
(192.108.45.245)
Marco Marletta
mx1.gen.ch
192.188.41.6/30
f01-151-45-e.gridka.de
(192.108.45.246)
Thomas Schmid, Hubert Weibel
ANA-100
VRF
VLAN
Vincenzo Capone,
Aleksandr
Kurbatov, Mian
Usman
100G
V
R
F
VLAN 3011
Network providers setup the VLAN
Local network providers and sites
coordinated final configurations
Sites must configure hardware interface
to match destinations
100
G
VLA
N3
0
Dedicated 10G link between PNNL DTN and ESNet
10G best-effort Label Switched Path in ESNet backbone
10G
dc.hep.pnnl.org
(192.188.41.20)
VLAN
Bridging
40G
kr-fzk.xwin.dfn.de
192.108.68.65/30
kit.edu
KIT
dcache
(192.108.46.24)
KIT (AS34878)
EU side
INFN
Napoli
10G
10Gbps Best-Effort
LSP between VRFs
192.188.41.2/30
100G
PNNL (AS65428)
. “traceroute” was used to confirm
the routing to each DTN
. “iperf” was used to do initial
network transfer rate test
. “gridftp” and/or “srm-copy” was used
to test site
. FTS3 server at GridKa was used
to schedule data transfers
MANLAN
Exchange Ethernet
ESnet (AS293)
10G
recasse01.na.infn.it
(193.205.223.100)
Trans-Atlantic data challenge
18
“iperf” results
. Required several parallel transfers
to reach network saturation
. Reached ~9.6Gbps
Output
1.0 GBytes/sec (=8Gbps)
(>2x the Tier-1 EU site requirements)
Input
Results using FTS3 server
. FTS3 optimization is not ideal:
1.0 GBytes/sec (=8Gbps)
Output
0.5 GBytes/sec
Input
. reached network saturation
but falls very quickly
. Large amount of drop packets
. satisfies the incoming network
requirements for Tier1 EU sites
up to calendar year “Year6”
(2021 or 2022)
Trans-Atlantic data challenge
19
KIT
2.0Gbps
inbound
Napoli
250MBytes/sec
(=2.0Gbps)
Challenges encountered
. The main issue was the configuration
of the local network apparatue
. Having all the servers at each site
using/checking the proper network route
. Hardware limitation (router, storage, etc)
. Not having dedicated setups
(shared with ATLAS, etc.)
To accommodate the increased rates
ESNet
. Modification of TCP windows was
performed at PNNL and Italy
. Routing hardware interface
. Configure/tune network interrupts
for multicore
. Modification of the FTS3 optimization
& global-timeout
LHCONE for Belle II ??
20
LHCONE is for LHC experiments
In Belle II
. European sites have already joined to LHCONE
. while, KEK and PNNL does not belong to LHCONE now
Our thoughts are
. Belle II prefers to have a closed network like LHCONE
. If configuring new VRFs for Belle II on each collaboration sites and related networks is difficult
or makes any problem on operation, one possibility for Belle II is to join to LHCONE
(if it is allowed.)
Considerations : to join LHCONE or to configure LHCONE-like VRF layer
. many Belle II computing sites overlap with computing sites in LHC experiments .
. negotiation with each site could be easier under this umbrella ?
. is it difficult to expand LHCONE to non-LHC experiments ?
. Configuring another LHCONE-like VRF layer for Belle II could be difficult for some sites ??
. Belle II traffic shares the same badnwidth with LHC experiments
. WAN traffic may be OK ?
. traffic pattern is different from LHC (Japan  US/Europe, US  Europe are main)
. but we do not have any financial support in Belle II.
Under this condition, we want to find a better solution (your comments are highly appreciated)
21
Spare slides
22
Resources at LHC experiments
120
100
Tape (PB)
80
60
40
20
1400
1200
ALICE
ATLAS
CMS
LHCb
CPU (kHS)
1000
0
120
100
2009
2010
2011
2012
2013
2014
2015
2012
2013
2014
2015
Disk (PB)
80
800
60
600
40
400
200
20
0
0
2009 2010 2011 2012 2013 2014 2015
2009
2010
2011
23
DIRAC
 Distributed Infrastructure with Remote Agent Control (developed by LHCb)
 Pilot jobs
 Modular structure that enabled it possible to submit jobs
to different backends.
EMI
computing cluster
Interoperability
in heterogeneous
computings
OSG
cloud
Network Connectivity in Asia
24
25
www.geant.net
The Pan-European Research and Education Network
GÉANT interconnects Europe’s National Research and Education Networks (NRENs). Together we connect over 50 million users at
10,000 institutions across Europe.
>=1Gbps and <10Gbps
10Gbps
20Gbps
30Gbps
>=100Gbps
GE
AR
AZ
GÉANT connectivity as at January 2014. GÉANT is operated by DANTE on behalf of Europe’s NRENs.
Armenia
Austria
Azerbaijan
Georgia
BY
GÉANT is co-funded by the European Union within its 7th R&D Framework Programme.
Belarus
MD
Moldova
UA
Ukraine