TTI 2008@ Tuscany, Italy. Some New Random Number Generators (Multi Recursive Generator) Tested on CASINO Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. QuickTimeý Dz TIF F Åià• èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ Ç™Ç±ÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv ÇÇ• ÅB Collaborators… QuickTi meý Dz êLí£É vÉ çÉ OÉ âÉ Ä Ç™Ç±ÇÃ É sÉ NÉ `É É Ç¾ å©ÇÈ ÇžÇ½ Ç…ÇÕï K óvÇÇ• ÅB Dr. Kenta Hongo (Maezono group) Prof. Ken-ichi Miura (National Institute of Informatics) QuickTimeý Dz TIF F Åià• èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ Ç™Ç±ÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv ÇÇ• ÅB Summary - A new Random Number Generator (RNG) “MRG8” developed by L’Eucuyer and Miura. - Much Simpler (only 33 lines!) than recent fancy RNG. Theoretically clear as well. It Shows good performance... - Seems equivalent to Ranlux4 with less cost than Ranlux3. QuickTimeý Dz TIF F Åià• èkǻǵÅj êLí£Év ÉçÉOÉâÉÄ Ç™Ç±ÇÃÉsÉNÉ`ÉÉǾå©ÇÈǞǽDžÇÕïKóv ÇÇ• ÅB Random Number Generator (RNG) - A kernel component of QMC. - What is the point for us? - Auto-correlation length and Bias, mostly. - No “serious” RNG required such as, - Physical RNG (based on thermal noise) - Cryptographic RNG (based on thermal noise) ... These matter only on Cryptograph, Gambles etc. ©前園涼・本郷研太2008 Points for us - Auto-correlation Length Better RNG has less correlation. - Bias → Shorter Blocking Length. → Effectively costless accumulation. Worse RNG has less homogeneous distribution (Sparse Lattice Structure) → Sampling would be biased. → Biased results. (3N-dim in our simulation) ©前園涼・本郷研太2008 Representative RNG (pseudo RNG) - Linear Congruential Method Xn a1 Xn1 mod p :-) Simple, Costless and Fast. :-( Sparse Hyper plane → Biased sampling 1) Recursive generation is just 1st. order. 2) Famous story about “RANDU” on IBM360/370. ... because of further reason of bad choice of a1 16807 (N.B., Not the genuine drawback of Congruential method in general) - Feedback Shift Register Method ... using ‘ ’ with more than two Xn k - Generalization for Higher Order (Topics of the study here) ... using ‘ ’ with more than two Xn k ©前園涼・本郷研太2008 (Not the target of the study here, brief overview) Further developments (1) on Feedback Shift Register Method ... using ‘ ’ with more than two Xn k Long period sequence with practically easy implementation. In practical situation, however, ... Part of it is used. Wrong “period” appears... Further improved GFSRs... (Generalized Feedback Shift Register) include “Mersenne Twister” (Matsumoto&Nishimura, 1996) ©前園涼・本郷研太2008 Further developments(2) on Higher order congruential-based. ... using ‘ ’ with more than two Xn k Include... - “Fibonacci-based” - “Subtract-with-Borrow” and its relatives Further ‘tune-up’-ed → “Ranlux” (Main target of the study here) (implemented in CASINO) - “Multiple Recursive Generator (MRG)” Xn a1 Xn 1 a2 Xn 2 L ak Xn k mod p has been known as a simple/powerful RNG, but... How to choose coeffs ak ? (Knuth’s criteria) it had been the practical obstacle. ©前園涼・本郷研太2008 Desired properties of RNG “Multiplicative Recursive Generator” has several Desired properties : - Costless and Fast. N.B.) “Mersenne Twister” is said to be ‘Good’ as well. - Theoretically simple N.B.) Non-linear congruential methods have no firm theoretical background. - Easy to be accelerated (Vector/Parallel, discussed later). ©前園涼・本郷研太2008 Choice of coefficients “Knuth’s criteria” Xn a1 Xn 1 a2 Xn 2 L ak Xn k mod p Choosing coeffs ak so that it has the Longest possible Period P pk 1 Characteristic Polynomial f z z k a1z k 1 L ak 1z ak Galois Field is a primitive polynomial on GF(p) Choose ak so that f z can be factorized (mod p) in proper way. Random Search for ak (Pierre L’Ecuyer@Univ. Montreal) pk 1 ~ Factorization of r p 1 ©前園涼・本郷研太2008 勘所は... - 計算機による數値演算とは、いわば有限體上での元操作である。 - 疑似乱數生成とは、 有限體上での元から元への射影操作を生成する漸化式のうち 周期が恐ろしく長いものを實現するという事である。 - したがって其の本質は所与の有限體の代數構造で決まっている。 ©前園涼・本郷研太2008 MRG8 Xn a1 Xn 1 a2 Xn 2 L ak Xn k mod p A good choice obtained for k=8 and p 2 1 31 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343 a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 (Found/Chosen by P. L’Ecuyer@Univ. Montreal) A RNG named “MRG8” 8 P 2 1 4.5 10 74 31 (implemented/tested by Prof. Miura) Only 33 lines ! ©前園涼・本郷研太2008 Point (1) Quality of RNG is critically depending on Coef. Choice. (Sparse Lattice structure) a1= 1089656042 a2= 1906537547 a3= 1764115693 a4= 1304127872 a5= 189748160 a6= 1984088114 a7= 626062218 a8= 1927846343 L’Ecuyer’s group carefully choose/test them. Again... Famous horror about “RANDU” on IBM360/370. - Recursive generation is just 1st. order. - Bad choice of coefficients. ©前園涼・本郷研太2008 Point (2) Xn a1 Xn1 a2 Xn 2 L ak Xn k mod p On 64-bit architecture 31 The choice of p 2 1 makes the implementation quite simple/fast! (No dividing operation required) Because... z Z63 L Z32 Z31 L Z2 Z1 2 : z1 ·2 31 z2 then, (binary description on 64-bit architecture) z z1 ·2 31 z1 z1 z2 z1 2 31 1 z1 z2 ∴ z mod 2 31 1 z1 z2 shift z,31 and z,2 31 1 (No dividing operation required) ∵) z1 0L 0Z63 L Z32 2 shift z,31 z2 0L 0Z 31 L Z2 Z1 2 Z63 L Z 32 Z 31 L Z2 Z1 2 .and.0L 01L 1 and z,2 31 1 ©前園涼・本郷研太2008 Statistical Tests Done by Miura or L’Ecuyer… (To be completed) ©前園涼・本郷研太2008 QMC Test (combined with CASINO-v1.8) Tests using G1 set 1.02604152729604E- 04 Ground St. Ene. (a.u.) SO2 (VMC) -548.205 (16) -548.206 -548.207 (# of step = 300 million) RANLUX-0 (4) -548.208 MRG8 (Blocking length) (16) -1 (1) -2 (2) -3 (2) -548.209 RANLUX-4 ©前園涼・本郷研太2008 Notes on Ranlux “Tuned-up” version of SwB algorithm (Martin & Luescher, 1993) Plucking of sequence to reduce auto-correlation “RANLUX-0” = “Subtract with Borrow” “RANLUX-1” More plucking, better performance in Spectrum Test “RANLUX-2” “RANLUX-3” N.B.) “RANLUX-0” = “Subtract with Borrow” ~ “1st. order Linear congruential method” (an effective implementation with very large prime number) (Tezuka & L’Ecuyer, 1992) Consistent with D.P. Landau’s work by Monte Carlo. ©前園涼・本郷研太2008 Tests using G1 set 1.02604152729604E- 04 Ground St. Ene. (a.u.) SO2 (VMC) -548.205 (16) -548.206 -548.207 (# of step = 300 million) RANLUX-0 (4) (Blocking length) (16) -1 -548.208 MRG8 (1) -2 (2) -3 (2) -548.209 RANLUX-4 ... can be viewed as 1st. order linear congruential RNG as well as the ‘Subtract with Borrow’ RNG. ©前園涼・本郷研太2008 1.02604152729604E- 04 SO2 (DMC) -548.560 -548.566 (512) RANLUX-0 (2048) MRG8 # of config. = 10,000 (Blocking length) (256) (512) -548.562 -548.564 # of step = 40,000 (8192) -3 (2048) -1 -2 RANLUX-4 ©前園涼・本郷研太2008 Source of Different Bias in DMC/VMC 1) Generating Random Walk (VMC/DMC) 2) Metropolis reject/accept (VMC/DMC) 3) Branching (DMC) ©前園涼・本郷研太2008 Timing Info SO2, DMC. User MRG8 # of step = 1,000 System # of config. = 10,000 Total CPU % CPU 176.0448 189.3554 365.6764 0.42 RANLUX-0 101.2084 190.2169 290.9649 0.33 RANLUX-1 127.7751 192.9557 321.1352 0.36 RANLUX-2 176.5535 183.0453 359.7706 0.42 RANLUX-3 304.1154 182.4181 486.7478 0.57 RANLUX-4 470.0464 182.1101 652.6842 0.76 (second) ©前園涼・本郷研太2008 He & PH2 (VMC) (Blocking length) He 1.02604152729604E- 04 # of step = 1000 milion -2.903686 -342.235 # of step = 150 milion (1) -342.236 -2.903688 -2.903690 PH2 1.02604152729604E- 04 (4) -342.237 (2) (1) (1) (1) -2.903692 MRG8 -0 (1) -1 -2 -3 RANLUX (1) -342.238 -342.239 -4 (4) (2) (16) -342.240 MRG8 (1) -0 -1 -2 -3 RANLUX -4 ©前園涼・本郷研太2008 He & PH2 (DMC) He 1.02604152729604E- 04 -2.90371 -2.90372 PH2 # of step = 50,000 # of config. = 10,000 (1024) (1024) -2.90374 (Blocking length) (2048) MRG8 (1024) -342.473 -342.475 (512) (4096) -0 # of step = 30,000 # of config. = 10,000 -342.474 (1024) -2.90373 1.02604152729604E- 04 -1 -2 -3 RANLUX (1024) (512)(1024) (2048) (512) -342.476 -342.478 -4 MRG8 -0 -1 -2 -3 RANLUX -4 ©前園涼・本郷研太2008 System dependence - (Bias of results) ~ (Homogeneity of RNG) gets worse in higher dim. of sampling space… (3N-dim’ in our case) Systems with larger # of electrons are interesting. - Sampling space has nodal structure. “All-electron systems” and “Pseudo Potential systems” Should examine Both. differ in its character. ©前園涼・本郷研太2008 SH4 (VMC/DMC, Pseudo Potential calc.) (VMC) -6.28928 -6.28944 (32) (16) (64) (32) (DMC) -6.3055 (32) (16) -6.3060 -6.3065 -6.28952 E/H E(SiH4- -6.28936 (Blocking length) (512) (1024) (2048) (512) -6.3070 (512) -6.28960 -6.3075 -6.28968 -6.28976 -6.28984 -6.3080 MRG8 -0 -1 -2 -3 RANLUX -4 # of step = 10 million -6.3085 MRG8 (2048) -0 -1 -2 -3 RANLUX -4 # of step = 30,000 # of config. = 10,000 ©前園涼・本郷研太2008 QMC Test (combined with CASINO-v1.8) “A Research Background” RNG research people are interested in it. c.f.) D.P. Landau’s test of RNG by QMC (Ising model) Better RNG in spectrum tests gives not always better performance on application. RNG people start to consider “harmony of RNG” depending on applications” Discussions 1) MRG8 always give the same answer as Ranlux4 2) MRG8 is costless than Ranlux3. 3) MRG8 gives no significant improvement on Blocking length. 4) Difference of Bias appeared in DMC/VMC 1) Generating Random Walk (VMC/DMC) 2) Metropolis reject/accept (VMC/DMC) 3) Branching (DMC) ©前園涼・本郷研太2008 Acceleration of RNG Xn a1 Xn 1 a2 Xn 2 L ak Xn k mod p can be written as Generating Sequence X1,L , X8 X9 X2 ,L , X8 , X9 X10 X2 ,L , X9 , X10 X11 i.e., a1 a2 L Xn X 0 1 0L n 1 0 O M M M X n k 1 0 0 0L ak X n 1 0 Xn 2 M M X n k 1 : A X8 X 7 A X 9 A X10 L M X 1 (Normal Sequence) ©前園涼・本郷研太2008 Acceleration (cont’d) a1 a2 L Xn X 0 1 0L n 1 0 O M M M X n k 1 0 0 0L ak X n 1 0 Xn 2 M M X n k 1 : A Having evaluated An in advance... X10 X9 X8 X X X 9 8 2 A A 7 M M M X X X 3 2 1 X8 n X8 X X 8 n 1 n A 7 M M X X 1 n 1 ©前園涼・本郷研太2008 Acceleration (cont’d) n Evaluation of A can be made faster by ... - Contemporary Compiler (Vectorization) - Intra-node parallelization (“thread parallel”) Then... X8 X 7 M X 1 A X9 A X10 A X11 A X12 L (normal ‘serial’ generation) A X9 A X13 A X17 A2 X10 A2 X14 A2 X18 A 3 X11 A 4 X12 A 3 X15 A 4 X16 A 3 X19 L A 4 X 20 (Accelerated generation) ©前園涼・本郷研太2008
© Copyright 2024 ExpyDoc