第一原理量子モンテカルロ法計算

TTI 2008@ Tuscany, Italy.
Some New Random Number Generators
(Multi Recursive Generator)
Tested on CASINO
Ryo Maezono
Japan Advanced Institute of Science and Technology,
Kanazawa, Japan.
QuickTimeý Dz
TIF F Åià•
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv Ç­Ç•
ÅB
Collaborators…
QuickTi meý Dz
êLí£É vÉ çÉ OÉ âÉ Ä
ǙDZÇÃ É sÉ NÉ `É É Ç¾ å©ÇÈ ÇžÇ½ Ç…ÇÕï K óvÇ­Ç•
ÅB
Dr. Kenta Hongo
(Maezono group)
Prof. Ken-ichi Miura
(National Institute of Informatics)
QuickTimeý Dz
TIF F Åià•
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv Ç­Ç•
ÅB
Summary
- A new Random Number Generator (RNG)
“MRG8”
developed by L’Eucuyer and Miura.
- Much Simpler (only 33 lines!) than recent fancy RNG.
Theoretically clear as well.
It Shows good performance...
- Seems equivalent to Ranlux4 with less cost than Ranlux3.
QuickTimeý Dz
TIF F Åià•
èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ
ǙDZÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv Ç­Ç•
ÅB
Random Number Generator
(RNG)
- A kernel component of QMC.
- What is the point for us?
- Auto-correlation length and Bias, mostly.
- No “serious” RNG required such as,
- Physical RNG (based on thermal noise)
- Cryptographic RNG (based on thermal noise)
... These matter only on Cryptograph, Gambles etc.
©前園涼・本郷研太2008
Points for us
- Auto-correlation Length
Better RNG has less correlation.
- Bias
→ Shorter Blocking Length.
→ Effectively costless accumulation.
Worse RNG has less homogeneous distribution
(Sparse Lattice Structure)
→ Sampling would be biased.
→ Biased results.
(3N-dim in our simulation)
©前園涼・本郷研太2008
Representative RNG
(pseudo RNG)
- Linear Congruential Method
Xn  a1 Xn1 mod p
:-) Simple, Costless and Fast.
:-( Sparse Hyper plane → Biased sampling
1) Recursive generation is just 1st. order.
2) Famous story about “RANDU” on IBM360/370.
... because of further reason of bad choice of a1  16807
(N.B., Not the genuine drawback of Congruential method in general)
- Feedback Shift Register Method
... using ‘  ’ with more than two Xn  k 
- Generalization for Higher Order
(Topics of the study here)
... using ‘  ’ with more than two Xn  k 
©前園涼・本郷研太2008
(Not the target of the study here, brief overview)
Further developments (1)
on Feedback Shift Register Method
... using ‘  ’ with more than two Xn  k 
Long period sequence with practically easy implementation.
In practical situation, however, ...
Part of it is used.
Wrong “period” appears...
Further improved GFSRs... (Generalized Feedback Shift Register)
include “Mersenne Twister”
(Matsumoto&Nishimura, 1996)
©前園涼・本郷研太2008
Further developments(2)
on Higher order congruential-based.
... using ‘  ’ with more than two Xn  k 
Include...
- “Fibonacci-based”
- “Subtract-with-Borrow” and its relatives
Further ‘tune-up’-ed → “Ranlux”
(Main target of the study here)
(implemented in CASINO)
- “Multiple Recursive Generator (MRG)”
Xn  a1 Xn 1  a2 Xn 2  L  ak Xn  k  mod p
has been known as a simple/powerful RNG, but...
How to choose coeffs
ak  ?
(Knuth’s criteria)
it had been the practical obstacle.
©前園涼・本郷研太2008
Desired properties of RNG
“Multiplicative Recursive Generator”
has several
Desired properties :
- Costless and Fast.
N.B.) “Mersenne Twister” is said to be ‘Good’ as well.
- Theoretically simple
N.B.) Non-linear congruential methods have no firm theoretical background.
- Easy to be accelerated (Vector/Parallel, discussed later).
©前園涼・本郷研太2008
Choice of coefficients
“Knuth’s criteria”
Xn  a1 Xn 1  a2 Xn 2  L  ak Xn  k  mod p
Choosing coeffs ak 
so that it has
the Longest possible Period
P  pk  1
Characteristic Polynomial
 f z   z k  a1z k 1  L  ak 1z  ak 
Galois Field
is a primitive polynomial on GF(p)
Choose ak  so that f z  can be factorized (mod p) in proper way.
Random Search for ak 
(Pierre L’Ecuyer@Univ. Montreal)
pk  1
~ Factorization of r 
p 1
©前園涼・本郷研太2008
勘所は...
- 計算機による數値演算とは、いわば有限體上での元操作である。
- 疑似乱數生成とは、
有限體上での元から元への射影操作を生成する漸化式のうち
周期が恐ろしく長いものを實現するという事である。
- したがって其の本質は所与の有限體の代數構造で決まっている。
©前園涼・本郷研太2008
MRG8
Xn  a1 Xn 1  a2 Xn 2  L  ak Xn  k  mod p
A good choice obtained for k=8 and p  2  1
31
a5= 189748160
a6= 1984088114
a7= 626062218
a8= 1927846343
a1= 1089656042
a2= 1906537547
a3= 1764115693
a4= 1304127872
(Found/Chosen by P. L’Ecuyer@Univ. Montreal)
A RNG named “MRG8”


8
P  2  1  4.5  10 74
31
(implemented/tested by Prof. Miura)
Only 33 lines !
©前園涼・本郷研太2008
Point (1)
Quality of RNG is critically depending on Coef. Choice.
(Sparse Lattice structure)
a1= 1089656042
a2= 1906537547
a3= 1764115693
a4= 1304127872
a5= 189748160
a6= 1984088114
a7= 626062218
a8= 1927846343
L’Ecuyer’s group carefully choose/test them.
Again...
Famous horror about “RANDU” on IBM360/370.
- Recursive generation is just 1st. order.
- Bad choice of coefficients.
©前園涼・本郷研太2008
Point (2)
Xn  a1 Xn1  a2 Xn 2  L  ak Xn  k  mod p
On 64-bit architecture
31
The choice of p  2  1 makes the implementation quite simple/fast!
(No dividing operation required)
Because...
z  Z63 L Z32 Z31 L Z2 Z1 2 : z1 ·2 31  z2
then,

(binary description on 64-bit architecture)

z  z1 ·2 31  z1  z1  z2  z1 2 31  1  z1  z2 
∴




z mod 2 31  1  z1  z2   shift z,31  and z,2 31  1
(No dividing operation required)
∵)
z1  0L 0Z63 L Z32 2  shift z,31


z2  0L 0Z 31 L Z2 Z1 2  Z63 L Z 32 Z 31 L Z2 Z1 2 .and.0L 01L 1  and z,2 31  1
©前園涼・本郷研太2008
Statistical Tests
Done by Miura or L’Ecuyer…
(To be completed)
©前園涼・本郷研太2008
QMC Test
(combined with CASINO-v1.8)
Tests using G1 set
1.02604152729604E- 04
Ground St. Ene. (a.u.)
SO2 (VMC)
-548.205
(16)
-548.206
-548.207
(# of step = 300 million)
RANLUX-0
(4)
-548.208
MRG8
(Blocking length)
(16)
-1
(1)
-2
(2)
-3
(2)
-548.209
RANLUX-4
©前園涼・本郷研太2008
Notes on Ranlux
“Tuned-up” version of SwB algorithm
(Martin & Luescher, 1993)
Plucking of sequence to reduce auto-correlation
“RANLUX-0” = “Subtract with Borrow”
“RANLUX-1”
More plucking, better performance in Spectrum Test
“RANLUX-2”
“RANLUX-3”
N.B.) “RANLUX-0” = “Subtract with Borrow”
~ “1st. order Linear congruential method”
(an effective implementation with very large prime number)
(Tezuka & L’Ecuyer, 1992)
Consistent with D.P. Landau’s work by Monte Carlo.
©前園涼・本郷研太2008
Tests using G1 set
1.02604152729604E- 04
Ground St. Ene. (a.u.)
SO2 (VMC)
-548.205
(16)
-548.206
-548.207
(# of step = 300 million)
RANLUX-0
(4)
(Blocking length)
(16)
-1
-548.208
MRG8
(1)
-2
(2)
-3
(2)
-548.209
RANLUX-4
... can be viewed as 1st. order linear congruential RNG
as well as the ‘Subtract with Borrow’ RNG.
©前園涼・本郷研太2008
1.02604152729604E- 04
SO2 (DMC)
-548.560
-548.566
(512)
RANLUX-0
(2048)
MRG8
# of config. = 10,000
(Blocking length)
(256)
(512)
-548.562
-548.564
# of step = 40,000
(8192)
-3
(2048)
-1
-2
RANLUX-4
©前園涼・本郷研太2008
Source of Different Bias in DMC/VMC
1) Generating Random Walk (VMC/DMC)
2) Metropolis reject/accept (VMC/DMC)
3) Branching
(DMC)
©前園涼・本郷研太2008
Timing Info
SO2, DMC.
User
MRG8
# of step = 1,000
System
# of config. = 10,000
Total CPU
% CPU
176.0448
189.3554
365.6764
0.42
RANLUX-0
101.2084
190.2169
290.9649
0.33
RANLUX-1
127.7751
192.9557
321.1352
0.36
RANLUX-2
176.5535
183.0453
359.7706
0.42
RANLUX-3
304.1154
182.4181
486.7478
0.57
RANLUX-4
470.0464
182.1101
652.6842
0.76
(second)
©前園涼・本郷研太2008
He & PH2 (VMC)
(Blocking length)
He
1.02604152729604E- 04
# of step = 1000 milion
-2.903686
-342.235
# of step = 150 milion
(1)
-342.236
-2.903688
-2.903690
PH2
1.02604152729604E- 04
(4)
-342.237
(2) (1)
(1)
(1)
-2.903692
MRG8
-0
(1)
-1 -2 -3
RANLUX
(1)
-342.238
-342.239
-4
(4) (2)
(16)
-342.240
MRG8
(1)
-0
-1 -2 -3
RANLUX
-4
©前園涼・本郷研太2008
He & PH2 (DMC)
He
1.02604152729604E- 04
-2.90371
-2.90372
PH2
# of step = 50,000
# of config. = 10,000
(1024)
(1024)
-2.90374
(Blocking length)
(2048)
MRG8
(1024)
-342.473
-342.475
(512)
(4096)
-0
# of step = 30,000
# of config. = 10,000
-342.474
(1024)
-2.90373
1.02604152729604E- 04
-1 -2 -3
RANLUX
(1024)
(512)(1024)
(2048)
(512)
-342.476
-342.478
-4
MRG8
-0
-1 -2 -3
RANLUX
-4
©前園涼・本郷研太2008
System dependence
- (Bias of results) ~ (Homogeneity of RNG)
gets worse in higher dim. of sampling space…
(3N-dim’ in our case)
Systems with larger # of electrons are interesting.
- Sampling space has nodal structure.
“All-electron systems” and “Pseudo Potential systems”
Should examine Both.
differ in its character.
©前園涼・本郷研太2008
SH4 (VMC/DMC, Pseudo Potential calc.)
(VMC)
-6.28928
-6.28944
(32)
(16)
(64)
(32)
(DMC)
-6.3055
(32)
(16)
-6.3060
-6.3065
-6.28952
E/H
E(SiH4-
-6.28936
(Blocking length)
(512)
(1024)
(2048)
(512)
-6.3070
(512)
-6.28960
-6.3075
-6.28968
-6.28976
-6.28984
-6.3080
MRG8
-0
-1
-2 -3
RANLUX
-4
# of step = 10 million
-6.3085
MRG8
(2048)
-0
-1
-2 -3
RANLUX
-4
# of step = 30,000
# of config. = 10,000
©前園涼・本郷研太2008
QMC Test
(combined with CASINO-v1.8)
“A Research Background”
RNG research people are interested in it.
c.f.) D.P. Landau’s test of RNG by QMC (Ising model)
Better RNG in spectrum tests gives
not always better performance on application.
RNG people start to consider
“harmony of RNG” depending on applications”
Discussions
1) MRG8 always give the same answer as Ranlux4
2) MRG8 is costless than Ranlux3.
3) MRG8 gives no significant improvement on Blocking length.
4) Difference of Bias appeared in DMC/VMC
1) Generating Random Walk
(VMC/DMC)
2) Metropolis reject/accept
(VMC/DMC)
3) Branching
(DMC)
©前園涼・本郷研太2008
Acceleration of RNG
Xn  a1 Xn 1  a2 Xn 2  L  ak Xn  k  mod p
can be written as
Generating Sequence
X1,L , X8  X9
X2 ,L , X8 , X9  X10
X2 ,L , X9 , X10  X11
i.e.,
 a1 a2 L
 Xn  
 X   0 1 0L
 n 1   
0
O
 M  M
M
 X
 
n  k 1 
 0 0 0L
ak 
 X n 1 
0
 Xn  2 


M  M 
  X n  k 
1 
: A
 X8 
X 
 7   A  X 9  A  X10  L
 M
 X 
1
(Normal Sequence)
©前園涼・本郷研太2008
Acceleration (cont’d)
 a1 a2 L
 Xn  
 X   0 1 0L
 n 1   
0
O
 M  M
M
 X
 
n  k 1 
 0 0 0L
ak 
 X n 1 
0
 Xn  2 


M  M 
  X n  k 
1 
: A
Having evaluated An in advance...
 X10 
 X9 
 X8 
X 
X 
X 
9
8
2

  A   A  7
 M
 M
 M
 X 
 X 
 X 
3
2
1
 X8  n 
 X8 
X

X 
8  n 1
n

  A  7
 M 
 M
 X 
 X 
1 n
1
©前園涼・本郷研太2008
Acceleration (cont’d)
n
Evaluation of A can be made faster by ...
- Contemporary Compiler (Vectorization)
- Intra-node parallelization (“thread parallel”)
Then...
 X8 
X 
 7
 M
 X 
1
 A  X9  A  X10  A  X11  A  X12  L
(normal ‘serial’ generation)

A
 X9
A
 X13
A
 X17
A2
 X10
A2
 X14
A2
 X18
A
3
 X11
A 4  X12

A
3
 X15
A 4  X16

A
3
 X19
L
A 4  X 20
(Accelerated generation)
©前園涼・本郷研太2008