Supplementary Table1Libraries used in construction of physical maps Fingerprinted Endsequenced Clones sequenced 6932 - - - 112 71040 Partly Partly 907 MboI 124 48960 Partly Partly 222 BAC HindIII 129 36864 Fully Fully 1081 OSJNBb BAC EcoRI 119 55296 Fully Fully 553 OJ BAC HindIII - 3416 Fully Fully 638 Sheared DNA 41 110592 - - 18 Library name Clone ID RGP Y ACa RGP PACb c RGP BAC d CUGI BAC CUGI BAC e Monsanto BAC f AGI f osmid Av e. insert No. of clones size (kb) Vector Enzy me Y Y AC EcoRI, NotI 350 P PAC Sau3A1 B BAC OSJNBa OSJNOa Fosmid g CUGI plasmid OSJNPb Plasmid HaeIII 10 165888 - - - CUGI plasmid OSJNPc Plasmid Sau3AI 10 138240 - - - - - - 630296 - - 34 3453 h Others Total OSJNA, OJA a Rice Genome Research Program YAC: Saji, S., b Rice Genome Research Program PAC, Baba, T., c Rice Genome Research Program BAC, Wu, J., d et al . Genome 44, 32-37 (2001) et al . Bull. Natl. Inst. Agrobiol. Resour. (Japan) 14, 41-51 (2000) et al . Plant J. 36, 720-730 (2003) Clemson University Genomics Institute BAC: Chen, M., et al . Plant Cell 14, 537-545 (2002) e Monsanto BAC: Barry, G. F., Plant Physiol. 125 , 1164-1165 (2001) f Arizona Genomics Institute fosmid: http://www.genome.arizona.edu/orders/direct.html?library=OSJNOa g Clemson University Genomics Institute plasmid: Yang, T. J., h OSJNA and OJA: Artificial gap-filling clones. et al . Theor. Appl. Genet . 107 , 652-660 (2003) Supplementary Table 2Sequence quality based on overlapping sequences Chr Ov erlapping clones 1 OSJNBa0049B20(TIGR) OSJNBa0004G10(RGP) 1 OSJNBa0049B20(TIGR) P0034C11(RGP) 1 OSJNBa0048I01(KRGP) P408G07(RGP) 1 OSJNBa0048I01(KRGP) B1099D03(RGP) 2 OSJNBa0049B20(CSHL) OJ1111_C07(RGP) 6 OJ1540_H01(TIGR) P0481E08(RGP) 6 OJ1540_H01(TIGR) P0541C02(RGP) 7 OSJNBb0024A20(ACWW) OSJNBa0072I06(RGP) 7 OSJNBb0024A20(ACWW) OSJNBb0018L13(RGP) 10 OJ1004_F02(TIGR) OSJNBa0014J14(ACWW) 10 OSJNBa0093I09(ACWW) OSJNBa0073L20(TIGR) 11 OSJNBa0052C03(RGIR) OSJNBa0052C16(TIGR) 11 OSJNBa0094P07(TIGR) OSJNBb0088N01(PGIR) 11 OSJNBa0025K19(Genoscope) OSJNBb0004B05(PGIR) Totals Accuracy based on base pair discrepancies (%) Accuracy based on both substitutions and insertions/deletions (%) Ov erlap sequence (bp) 117850 136361 185780 58337 150046 57158 88194 19465 129833 85048 82117 45294 71138 21264 1247885 Base Insertion / substitutions deletion (bp) (bp) 2 1 7 0 0 0 0 0 0 0 0 0 0 0 10 8 8 32 7 40 9 0 0 20 0 10 0 0 0 134 99.9992 99.9885 Supplementary Table 3 Ce ntO (155 bp s ate llite DNA) units w ithin chrom os om e ps e udom ole cule s Chr Physical Total units Total amount Identity b (%) CentO sequence locationa map (bp) 1 2 3 4 5 6 7 8 9 10 11 12 All a b partial partial partial complete complete partial partial complete partial partial partial partial 16682539-17130082 13570041-13857135 19346905-19452749 9808276-9933189 12357944-12421998 15266486-15272427 11992162-12227761 12913343-13833051 2697206-2927046 7701335-8609236 11939274-11939374 11771323-12117142 1055 1297 158 355 325 39 578 443 776 7 814 5847 163525 201035 24490 55025 50375 6045 89590 68665 120280 1085 126170 906285 82.3-97.6 84.0-97.9 80.2-97.2 82.2-96.7 84.3-96.5 81.2-97.4 84.3-97.8 81.9-99.3 85.3-95.0 83.0-92.9 79.4-97.8 Pseudomolecule coordinates. The 155-bp consensus CentO seuqence analy zed f rom the centromeric region of chromosome 8 was used f or Blast analy sis. Supplementary Table 4 Chrom osom al dis tribution of gene m odels Length Predicted Gene Density Chromosome (bp) models (kbp/gene) 4856 8.9 1 43260640 3964 9.1 2 35954074 4159 8.7 3 36189985 3400 10.4 4 35489479 2956 10.1 5 29733216 3079 10 6 30731386 3044 9.7 7 29643843 2708 10.5 8 28434680 2175 10.4 9 22692709 2185 10.4 10 22683701 2650 10.7 11 28357783 2368 11.6 12 27561960 Total 370733456 37544 9.9 Supplementary Table 5 Statis tics for the pre dicte d ge ne s in the rice ps e udom ole cule s and a com paris on wAra ith bidopsi s tha l i a na Rice Arab idopsis pseudomolecules genome a Length (bp) Predicted genes Number Gene density (kb per gene) Average gene length (bp) Exons Number Total length (bp) Average per gene Average size (bp) Introns Number Total length (bp) Average per gene Average size (bp) Base composition (GC %) Exon Intron Intergenic Gene Genome a 370,733,456 115,409,949 37,544 9.9 2,699 25,498 4.5 1,992 175,203 44,492,676 4.7 254 132,982 33,249,250 5.2 250 137,659 56,841,388 3.7 413 107,484 18,055,421 4.2 168 54.2 38.3 42.9 45.3 43.6 44.1 Arabidopsis Genome Initiativ e. Nature 408, 796-815 (2000). 32.7 34.7 Supplementary Table 6 Cove rage of FGENESH m ode ls w ith FL-cDNAs Cutof f length Alignments FGENESH FLcDNAs (%) models 25 30,253 17,016 25,636 50 23,581 14,907 22,046 75 17,690 11,534 16,719 80 16,400 10,806 15,513 FGENESH models (37,544) were searched using BLASTN against the collection of 32,767 FL-cDNAs. The alignments were parsed using 95% identity ov er v ariable length cutof f s. Supplementary Table 7The 50 most frequent domains detected by Interpro 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 IPRid IPR011009 IPR000719 IPR002290 IPR001245 IPR008271 IPR001611 IPR001810 IPR008941 IPR007090 IPR009057 IPR002885 IPR001841 IPR002182 IPR000767 IPR008938 IPR001128 IPR001005 IPR003591 IPR008940 IPR000504 IPR002401 IPR000345 IPR009007 IPR003593 IPR001680 IPR002048 IPR002110 IPR000379 IPR011046 IPR010983 IPR007087 IPR001092 IPR001471 IPR002016 IPR001878 IPR003612 IPR002213 IPR008974 IPR010255 IPR001440 IPR008985 IPR001687 IPR003439 IPR000210 IPR008994 IPR000823 IPR011011 IPR001410 IPR001480 IPR003441 Description Gene models Protein kinase-like 1425 Protein kinase 1366 Serine/threonine protein kinase 1286 Ty rosine protein kinase 1264 Serine/threonine protein kinase, activ e site 1075 Leucine-rich repeat 837 Cy clin-like F-box 620 TPR-like 572 Leucine-rich repeat, plant specif ic 431 Homeodomain-like 425 PPR repeat 423 Zn-f inger, RING 401 NB-ARC 396 Disease resistance protein 361 ARM repeat f old 337 Cy tochrome P450 331 My b, DNA-binding 328 Leucine-rich repeat, ty pical subty pe 321 Protein preny ltransf erase 305 RNA-binding region RNP-1 (RNA recognition motif ) 279 E-class P450, group I 274 Cy tochrome c heme-binding site 262 Peptidase aspartic 242 AAA ATPase 240 G-protein beta WD-40 repeat 233 Calcium-binding EF-hand 218 Anky rin 214 Esterase/lipase/thioesterase 210 WD40-like 188 EF-Hand-like 187 Zn-f inger, C2H2 ty pe 174 Basic helix-loop-helix dimerisation region bHLH 162 Pathogenesis-related transcriptional f actor and ERF 157 Haem peroxidase, plant/f ungal/bacterial 154 Zn-f inger, CCHC ty pe 154 Plant lipid transf er/seed storage/try psin-alpha amy lase inhibitor 150 UDP-glucuronosy l/UDP-glucosy ltransf erase 146 TRAF-like 139 Haem peroxidase 139 TPR repeat 139 Concanav alin A-like lectin/glucanase 137 ATP/GTP-binding site motif A (P-loop) 137 ABC transporter 129 BTB/POZ 127 Nucleic acid-binding OB-f old 124 Plant peroxidase 123 FY VE/PHD zinc f inger 122 DEAD/DEAH box helicase 120 Curculin-like (mannose-binding) lectin 119 No apical meristem (NAM) protein 117 Supplementary Table 8 Ce re al-s pe cific prote ins Protein Number Abscisic stress ripening protein 4 Chitinase precursor 3 Citrate binding protein precursor 1 Endonuclease 1 Glucan 1,3-beta-glucosidase precursor 3 Heterogenous nuclear ribonucleoprotein 1 Jasomate-induced protein 4 Mannosyltransf erase 1 Pathogenesis-related protein PR-10a 5 Phytosulfokines precursor 1 Prolamin 31 Proteinase inhibitor 10 Queuine tRNA-ribosyltransf erase 2 Ribosome-inactivating protein 1 SAM-dependent methyltransf erase 1 Seed allergen 5 Starch branching enzyme 1 Wound-induced protease inhibitor 1 Supplementary Table 9Motifs in tandemly repeated gene families IPR Numbera IPR011009 IPR002290 IPR001245 IPR008271 IPR011009 IPR001245 IPR002290 IPR000719 IPR008271 IPR000719 IPR011009 IPR002290 IPR001245 IPR008271 IPR002182 IPR000767 IPR001611 IPR001611 IPR007090 IPR011009 IPR002290 IPR001245 IPR000719 IPR008271 IPR003591 IPR011009 IPR000719 IPR001245 IPR002290 IPR008271 IPR011009 IPR000719 IPR002290 IPR001245 IPR008271 IPR002885 IPR008941 IPR008940 IPR011009 IPR002290 IPR001245 IPR000719 IPR000719 IPR011009 IPR002290 IPR001245 IPR008271 IPR000719 IPR002290 IPR011009 IPR001245 IPR002885 IPR008941 IPR008940 IPR001245 Motif Description Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e Protein kinase-like Ty rosine protein kinase Serine/threonine protein kinase Protein kinase Serine/threonine protein kinase, activ e Protein kinase Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e NB-ARC Disease resistance protein Leucine-rich repeat Leucine-rich repeat Leucine-rich repeat, plant specif ic Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Protein kinase Serine/threonine protein kinase, activ e Leucine-rich repeat, ty pical subty pe Protein kinase-like Protein kinase Ty rosine protein kinase Serine/threonine protein kinase Serine/threonine protein kinase, activ e Protein kinase-like Protein kinase Serine/threonine protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e PPR repeat TPR-like Protein preny ltransf erase Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Protein kinase Protein kinase Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e Protein kinase Serine/threonine protein kinase Protein kinase-like Ty rosine protein kinase PPR repeat TPR-like Protein preny ltransf erase Ty rosine protein kinase site site site site site site site Gene models with Gene models in motif /domain tandem array s 125 134 122 134 121 134 107 134 92 109 91 109 91 109 91 109 84 109 85 100 85 100 81 100 80 100 70 100 77 91 66 91 64 91 57 73 55 73 53 73 53 73 53 73 53 73 45 73 44 73 56 69 56 69 55 69 54 69 49 69 55 57 54 57 53 57 53 57 46 57 51 51 43 51 37 51 50 50 50 50 50 50 50 50 45 50 45 50 45 50 45 50 37 50 41 45 41 45 41 45 41 45 44 44 36 44 29 44 42 43 IPR000719 IPR011009 IPR002290 IPR008271 IPR008974 IPR000210 IPR002083 IPR002182 IPR000767 IPR001611 IPR011009 IPR001245 IPR000719 IPR002290 IPR008271 IPR001128 IPR002401 IPR001245 IPR002290 IPR011009 IPR000719 IPR008271 IPR001611 IPR007090 NO IPR IPR001128 IPR002401 IPR002182 IPR000767 IPR001611 IPR002885 IPR008941 IPR008940 IPR002182 IPR000767 IPR004045 IPR010987 IPR004046 IPR002952 IPR001245 IPR000719 IPR002290 IPR011009 IPR008271 IPR011009 IPR002290 IPR000719 IPR001245 IPR008271 IPR007658 IPR002885 IPR008941 IPR008940 NO IPR IPR002182 IPR000767 IPR001283 NO IPR Protein kinase Protein kinase-like Serine/threonine protein kinase Serine/threonine protein kinase, activ e site TRAF-like BTB/POZ MATH NB-ARC Disease resistance protein Leucine-rich repeat Protein kinase-like Ty rosine protein kinase Protein kinase Serine/threonine protein kinase Serine/threonine protein kinase, activ e site Cy tochrome P450 E-class P450, group I Ty rosine protein kinase Serine/threonine protein kinase Protein kinase-like Protein kinase Serine/threonine protein kinase, activ e site Leucine-rich repeat Leucine-rich repeat, plant specif ic 42 42 42 34 40 40 35 40 39 31 37 37 37 37 33 36 33 31 31 31 31 25 24 24 Cy tochrome P450 E-class P450, group I NB-ARC Disease resistance protein Leucine-rich repeat PPR repeat TPR-like Protein preny ltransf erase NB-ARC Disease resistance protein Glutathione S-transf erase, N-terminal Glutathione S-transf erase, C-terminal-like Glutathione S-transf erase, C-terminal Eggshell protein Ty rosine protein kinase Protein kinase Serine/threonine protein kinase Protein kinase-like Serine/threonine protein kinase, activ e site Protein kinase-like Serine/threonine protein kinase Protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e site Protein of unknown f unction DUF594 PPR repeat TPR-like Protein preny ltransf erase 32 29 29 28 25 30 26 19 27 26 28 28 27 23 27 27 27 27 19 23 23 23 23 20 20 26 21 17 NB-ARC Disease resistance protein Allergen V5/Tpx-1 related 21 20 23 43 43 43 43 42 42 42 41 41 41 37 37 37 37 37 36 36 35 35 35 35 35 35 35 32 32 32 31 31 31 30 30 30 29 29 28 28 28 28 27 27 27 27 27 27 27 27 27 27 26 26 26 26 25 25 25 23 23 IPR004320 IPR002885 IPR008941 IPR008940 IPR001810 IPR002182 IPR000767 IPR001611 IPR002213 IPR002885 IPR008941 IPR002885 IPR008941 IPR008940 NO IPR IPR005299 IPR002213 IPR002401 IPR001128 IPR000767 IPR002182 IPR001611 NO IPR IPR002182 IPR000767 IPR001611 NO IPR IPR000767 IPR002182 IPR001611 NO IPR NO IPR IPR001611 IPR007090 IPR003591 IPR002016 IPR010255 IPR000823 NO IPR IPR003612 IPR000719 IPR008271 IPR011009 IPR001245 IPR002290 IPR002885 IPR008941 IPR008940 IPR009007 IPR002110 IPR001810 IPR001810 IPR011009 IPR000719 IPR002290 IPR001245 IPR008271 IPR000490 Arabidopsis conserv ed protein PPR repeat TPR-like Protein preny ltransf erase Cy clin-like F-box NB-ARC Disease resistance protein Leucine-rich repeat UDP-glucuronosy l/UDP-glucosy ltransf erase PPR repeat TPR-like PPR repeat TPR-like Protein preny ltransf erase 20 23 22 15 20 22 22 20 16 22 21 21 20 14 SAM dependent carboxy l methy ltransf erase UDP-glucuronosy l/UDP-glucosy ltransf erase E-class P450, group I Cy tochrome P450 Disease resistance protein NB-ARC Leucine-rich repeat 18 18 20 20 20 19 18 NB-ARC Disease resistance protein Leucine-rich repeat 19 18 18 Disease resistance protein NB-ARC Leucine-rich repeat 18 17 15 Leucine-rich repeat Leucine-rich repeat, plant specif ic Leucine-rich repeat, ty pical subty pe Haem peroxidase, plant/f ungal/bacterial Haem peroxidase Plant peroxidase 18 18 17 17 17 17 Plant lipid transf er/seed storage/try psin-alpha amy lase inhibitor Protein kinase Serine/threonine protein kinase, activ e site Protein kinase-like Ty rosine protein kinase Serine/threonine protein kinase PPR repeat TPR-like Protein preny ltransf erase Peptidase aspartic Anky rin Cy clin-like F-box Cy clin-like F-box Protein kinase-like Protein kinase Serine/threonine protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e site Gly coside hy drolase, f amily 17 13 17 17 17 17 17 17 15 11 16 17 13 14 17 17 16 16 16 16 23 23 23 23 23 22 22 22 22 22 22 21 21 21 21 21 20 20 20 20 20 20 19 19 19 19 19 18 18 18 18 18 18 18 18 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 16 IPR009007 NO IPR IPR002213 IPR002885 IPR008940 IPR008941 IPR002213 IPR011009 IPR002290 IPR001245 IPR000719 IPR008271 IPR000210 IPR008974 IPR002083 IPR001128 IPR002401 IPR000719 IPR001245 IPR011009 IPR002290 IPR008271 IPR002401 IPR001128 IPR005123 IPR004253 IPR001128 IPR002401 IPR002885 IPR008941 IPR008940 IPR000379 NO IPR NO IPR IPR002885 IPR008941 IPR008940 IPR002110 IPR001611 IPR007090 IPR003591 IPR002885 IPR008940 IPR008941 IPR002213 IPR009007 IPR001810 IPR003676 IPR003480 IPR001878 IPR001087 IPR002213 IPR001128 IPR002401 IPR001245 IPR000719 IPR011009 IPR002290 Peptidase aspartic 11 UDP-glucuronosy l/UDP-glucosy ltransf erase PPR repeat Protein preny ltransf erase TPR-like UDP-glucuronosy l/UDP-glucosy ltransf erase Protein kinase-like Serine/threonine protein kinase Ty rosine protein kinase Protein kinase Serine/threonine protein kinase, activ e site BTB/POZ TRAF-like MATH Cy tochrome P450 E-class P450, group I Protein kinase Ty rosine protein kinase Protein kinase-like Serine/threonine protein kinase Serine/threonine protein kinase, activ e site E-class P450, group I Cy tochrome P450 2OG-Fe(II) oxy genase superf amily Protein of unknown f unction DUF231 Cy tochrome P450 E-class P450, group I PPR repeat TPR-like Protein preny ltransf erase Esterase/lipase/thioesterase 12 16 12 12 15 16 16 16 16 14 16 15 12 15 12 15 15 15 15 11 14 14 15 14 15 10 15 14 11 14 PPR repeat TPR-like Protein preny ltransf erase Anky rin Leucine-rich repeat Leucine-rich repeat, plant specif ic Leucine-rich repeat, ty pical subty pe PPR repeat Protein preny ltransf erase TPR-like UDP-glucuronosy l/UDP-glucosy ltransf erase Peptidase aspartic Cy clin-like F-box Auxin responsiv e SAUR protein Transf erase Zn-f inger, CCHC ty pe Lipoly tic enzy me, G-D-S-L UDP-glucuronosy l/UDP-glucosy ltransf erase Cy tochrome P450 E-class P450, group I Ty rosine protein kinase Protein kinase Protein kinase-like Serine/threonine protein kinase 15 11 10 14 14 14 12 14 12 11 13 13 11 14 13 9 11 11 13 13 13 13 13 13 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 14 14 14 14 14 14 14 14 14 14 14 14 13 13 13 13 13 13 13 13 IPR008271 IPR010255 IPR002016 IPR000823 IPR001810 IPR010616 IPR008941 IPR002885 IPR007118 IPR007112 IPR009009 IPR007117 IPR001245 IPR002290 IPR000719 IPR011009 IPR008271 IPR005630 IPR001906 IPR008949 IPR008930 IPR000767 IPR002182 IPR001611 IPR011009 IPR008271 IPR000719 IPR002290 IPR001245 IPR002182 IPR000767 IPR001611 IPR009007 IPR002213 IPR002347 IPR002198 NO IPR IPR007113 IPR011051 IPR006045 IPR001929 IPR001810 IPR000668 IPR000169 IPR004265 IPR001810 NO IPR IPR006041 IPR001810 IPR000109 IPR001128 IPR002401 IPR001087 IPR002213 IPR001810 IPR001810 NO IPR IPR000823 Serine/threonine protein kinase, activ e site Haem peroxidase Haem peroxidase, plant/f ungal/bacterial Plant peroxidase Cy clin-like F-box Protein of unknown f unction DUF1210 TPR-like PPR repeat Expansin/Lol pI Expansin 45, endoglucanase-like Barwin-related endoglucanase Pollen allergen/expansin, C-terminal Ty rosine protein kinase Serine/threonine protein kinase Protein kinase Protein kinase-like Serine/threonine protein kinase, activ e site Terpene sy nthase, metal-binding Terpene sy nthase-like Terpenoid sy nthase Terpenoid cy lases/protein preny ltransf erase alpha-alpha toroid Disease resistance protein NB-ARC Leucine-rich repeat Protein kinase-like Serine/threonine protein kinase, activ e site Protein kinase Serine/threonine protein kinase Ty rosine protein kinase NB-ARC Disease resistance protein Leucine-rich repeat Peptidase aspartic UDP-glucuronosy l/UDP-glucosy ltransf erase Glucose/ribitol dehy drogenase Short-chain dehy drogenase/reductase SDR 9 13 13 13 11 13 13 13 12 12 12 12 12 12 12 12 11 12 12 12 11 12 11 9 12 12 12 12 12 12 10 8 8 10 12 12 Cupin region RmlC-like cupin Cupin Germin Cy clin-like F-box Peptidase C1A, papain Peptidase, eukary otic cy steine peptidase activ e site Plant disease resistance response protein Cy clin-like F-box 12 12 12 12 11 12 11 11 9 Pollen Ole e 1 allergen and extensin Cy clin-like F-box TGF-beta receptor, ty pe I/II extracellular region Cy tochrome P450 E-class P450, group I Lipoly tic enzy me, G-D-S-L UDP-glucuronosy l/UDP-glucosy ltransf erase Cy clin-like F-box Cy clin-like F-box 8 9 11 11 10 11 10 9 10 Plant peroxidase 11 13 13 13 13 13 13 13 13 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 11 11 11 11 11 11 11 11 11 11 11 11 IPR002016 IPR010255 IPR008938 IPR000379 IPR001810 NO IPR IPR000767 IPR002182 IPR001611 IPR001611 IPR007090 IPR002290 IPR011009 IPR003591 IPR000719 IPR001245 IPR008271 IPR009007 IPR001461 IPR001223 IPR000209 IPR003137 IPR009020 IPR010259 NO IPR IPR001841 NO IPR IPR001128 IPR002401 IPR005829 IPR003663 IPR007114 IPR005828 IPR002182 IPR000767 IPR001611 IPR000209 IPR003137 IPR002213 IPR001360 IPR009003 IPR001478 IPR001320 IPR001311 IPR011009 IPR001480 IPR000719 IPR008271 IPR001245 IPR002290 IPR003480 NO IPR IPR011009 IPR002182 IPR011009 IPR003612 NO IPR IPR000210 IPR008974 IPR002083 IPR001938 a Haem peroxidase, plant/f ungal/bacterial Haem peroxidase ARM repeat f old Esterase/lipase/thioesterase Cy clin-like F-box 11 11 10 11 8 Disease resistance protein NB-ARC Leucine-rich repeat Leucine-rich repeat Leucine-rich repeat, plant specif ic Serine/threonine protein kinase Protein kinase-like Leucine-rich repeat, ty pical subty pe Protein kinase Ty rosine protein kinase Serine/threonine protein kinase, activ e site Peptidase aspartic Peptidase A1, pepsin Gly coside hy drolase, f amily 18 Peptidase S8 and S53, subtilisin, kexin, sedolisin Protease-associated PA Proteinase inhibitor, propeptide Proteinase inhibitor I9, subtilisin propeptide 11 11 9 11 11 10 10 10 10 10 8 11 10 11 10 10 9 9 Zn-f inger, RING 10 Cy tochrome P450 E-class P450, group I Sugar transporter superf amily Sugar transporter Major f acilitator superf amily General substrate transporter NB-ARC Disease resistance protein Leucine-rich repeat Peptidase S8 and S53, subtilisin, kexin, sedolisin Protease-associated PA UDP-glucuronosy l/UDP-glucosy ltransf erase Gly coside hy drolase, f amily 1 Peptidase, try psin-like serine and cy steine proteases PDZ/DHR/GLGF Ionotropic glutamate receptor Solute-binding protein/glutamate receptor Protein kinase-like Curculin-like (mannose-binding) lectin Protein kinase Serine/threonine protein kinase, activ e site Ty rosine protein kinase Serine/threonine protein kinase Transf erase 10 10 9 9 9 9 10 10 7 7 7 9 10 9 8 10 8 10 10 10 9 9 9 10 Protein kinase-like NB-ARC Protein kinase-like Plant lipid transf er/seed storage/try psin-alpha amy lase inhibitor 10 8 10 10 BTB/POZ TRAF-like MATH Thaumatin, pathogenesis-related 10 10 10 10 Domains detected in Interpro f or the gene clusters of ten or more members. 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Supplementary Table 10 M icroRNAs in theOryza ge nom e Chr Oryza Other Total 1 2 3 4 5 6 7 8 9 10 11 12 Total miRNA a 17 21 11 18 7 11 6 15 7 5 5 6 129 miRNA b 1 2 1 1 5 4 4 4 0 1 2 4 29 18 23 12 19 12 15 10 19 7 6 7 10 158 a Predicted homologues of ArabidopsismiRNAs from the Rfam database. b Homol ogues of experimentally validated miRNAs of other species excludingArabidopsis. Supplementary Table 11 s noRNA and s plice s om al RNA ge ne s in theOryza ge nom e Chr 1 2 3 4 5 6 7 8 9 10 11 12 Total snoRNA genes 13 17 64 12 19 16 22 14 5 14 7 12 215 Splicesomal RNA genes 6 25 13 9 1 11 10 6 0 4 5 3 93 Supplementary Table 12 Or gane llar s e que nce s in the Nipponbare chr om osom e s A Chlor oplas t ins e rts Chr 1 2 3 4 5 6 7 8 9 10 11 12 Totals Ave. % i d. Genome eq. MUMmer wit h high st ringency Inserts Tot al lengt h % of (No.) (bp) chrom osome 71 43 63 47 24 35 25 22 21 20 29 53 453 78842 67793 38121 86912 37594 45703 30319 51943 10229 165771 10056 79803 703086 98. 68 5.22 0.183 0.190 0.112 0.252 0.138 0.147 0.106 0.184 0.047 0.739 0.041 0.298 0.196 BLAST wit h medium st ringency Inserts Tot al lengt h % of (No.) (bp) chomosome 67 42 61 40 23 34 24 21 19 14 29 47 421 100307 76179 49043 128137 44081 54531 35699 57576 20515 178504 17127 91214 852913 93. 12 6.34 0.233 0.213 0.145 0.372 0.162 0.175 0.124 0.204 0.094 0.796 0.007 0.341 0.238 B Mitochondrial ins e r ts Chr 1 2 3 4 5 6 7 8 9 10 11 12 Totals Ave. % i d. Genome eq. MUMmer with high stringency Inserts Tot al lengt h % of (No.) (bp) chrom osome 166 49 57 68 28 61 31 31 33 61 36 288 909 72223 18901 114731 50294 13809 44952 8687 13306 11174 41487 9320 231573 630457 95. 97 1.29 0.168 0.053 0.338 0.146 0.051 0.144 0.03 0.047 0.051 0.185 0.038 0.865 0.176 BLAST with medium stringency Inserts Tot al lengt h % of (No.) (bp) chrom osome 197 57 92 94 32 79 30 44 41 85 50 390 1191 78260 19305 119718 53267 13068 44253 8117 15314 17295 41347 10507 269642 690093 98. 02 1.41 0.182 0.054 0.353 0.155 0.048 0.142 0.028 0.054 0.079 0.184 0.043 1.007 0.193 Supplementary T abl e 13 Large organellar inserts in the Nipponbare genome A Chloroplast inserts Chr Start on Stop on Chromo- Start on chromochromo- somal length ct some some (bp) 10 10180595 10311746 131152 117385 8 9255091 9290202 35112 58789 10 19671024 19704027 33004 114484 6 23583508 23613061 29554 49883 4 8775246 8795361 20116 10330 4 8748156 8767894 19739 80045 12 5491125 5510781 19657 98906 2 14407382 14426139 18758 6890 4 8691010 8706795 15786 112456 4 8795362 8809793 14432 86406 7 13416762 13431163 14402 120124 12 5510782 5524143 13362 58215 4 8724611 8735889 11279 70435 T otal bp i n 13 inserts 376353 B Mitochondrial inserts Chr Start on Stop on Chromo- Start on chromochromo- somal length m t some some (bp) 12 19839260 19879661 40402 51833 12 19975413 19993653 18241 255406 12 19935657 19951039 15383 190687 3 22618288 22632207 13920 37915 12 19880779 19892524 11746 40088 12 19953387 19963860 10474 212838 6 12434844 12444862 10019 41292 T otal bp i n 7 inserts 120185 Stop on Ct l ength Identi ty (%) Strand ct (bp) 113606 130747 93858 35070 125520 33385 79423 29541 30422 20093 99713 19669 118637 19732 25609 18720 128207 15752 101147 14742 134525 14402 71624 13410 81665 11231 Mean % identity 98.74 99.34 99.22 98.87 99.14 98.49 98.58 98.57 99.18 98.25 99.76 98.61 98.94 98.88 D/R R R R D D D R R D D D R Stop on Mt length Identi ty (%) Strand mt (bp) 92236 40404 273646 18241 206049 15363 51834 13920 51834 11747 223312 10475 51269 9978 Mean % identity 99.71 99.85 99.48 99.09 98.8 99.51 95.62 99.18 R R D R R D D Supplementary T able 14 Effect of av erage length on transposable element distribution patterns T ransposable element Average length Rel ative Correlation with Correlation with family (bp) centromeric recombinationb gene densityb hAT CACT A Dasheng LINEs solo LT Rs other class II IS630/T c1/mariner IS256/Mutator SINEs IS5/Tourist other T Es T RIM T y1/copia T y3/g ypsy genes exons 1554 1062 1503 505 919 154 139 1999 118 229 295 224 1536 1878 abundancea 0.84* 1.15** 2.23** 0.55** 2.13** 0.68** 0.56** 0.75** 0.70** 0.64** 0.69** 0.85 1.36** 1.85** 0.73** 0.48** 0.014 -0.117** -0.334** 0.265** -0.416** 0.236** 0.411** 0.122** 0.139** 0.346** 0.208** -0.016 -0.222** -0.268** 0.355** 0.376** 0.021 -0.251** -0.429** 0.177** -0.472** 0.152** 0.320** 0.095** 0.112** 0.391** 0.178** -0.097** -0.352** -0.515** a Abundance in centromeric and pericentromeric regions relativ e to random expectation. Values greater than 1 represent ov er-representationin centromeric and pericentromeric regions, while v alues less than 1 are underrepresentated. Signif icance ev aluated using the chi square statistic with 1 degree of f reedom. b Spearman rank correlation coef f icients between element abundance and correlation with recombination rate and gene density . Signif icance: *, p<0.01; **, p<0.001. Supplementary T able 15 Comparison of mapped Kasalath BAC-end sequences w ith the Nipponbare pseudomolecules Chr Mapped clones SNPs (bp) Small T otal alignmentSNP rate (%) InDels (bp) (bp) 1 1830 10162 4028 1706473 0.60 2 1524 7562 3051 1427624 0.53 3 1686 8902 3514 1553638 0.57 4 1110 5923 2005 1019687 0.58 5 998 6813 2376 916803 0.74 6 1128 7452 2705 1037609 0.72 7 1002 6608 2327 925108 0.71 8 1034 6296 2399 965522 0.65 9 706 4973 1717 642492 0.77 10 821 5219 1770 762127 0.68 11 773 5212 1805 717266 0.73 12 704 5005 1701 644751 0.78 Total 13316 80127 29398 12319100 0.65 Supplementary Table 16Pattern of nucleotide substitution between Nipponbare and Kasalath Pattern A to G T to C A to T C to A T to G G to C Total 1 3539 3618 894 829 776 506 2 2630 2718 667 614 585 348 3 3119 3192 769 746 637 439 4 2002 1985 573 495 525 343 5 2406 2400 620 515 544 328 Chromosome 6 7 2592 2394 2646 2359 715 575 579 480 575 497 345 303 8 2171 2217 604 491 473 340 9 1649 1803 501 423 344 253 10 1755 1857 511 427 383 286 11 1718 1875 514 421 381 303 12 1696 1773 503 392 384 257 Total substitutions 27671 28443 7446 6412 6104 4051 80127 % 34.5 35.5 9.3 8.0 7.6 5.1 Supplementary Table 17.Locations of SSRs relative to geneti c marker s. T oo large to displ ay. Please see http://ricelab.plbr.cornell.edu/publications/2005/IRGSP/suppl ementaltable16.xls Supple m e ntary Table 18. Information on all SSRs. Too large to display. Please see http://rice lab.plbr.corne ll.edu/publications /2005/IRGSP/supplem e ntaltable17.xls Supplementary Table 19 Cove rage of the ps e udom ole cule s by the draft s e que nce s Chromosome Pseudomolecule BGI aligned Coverage Syngenta Coverage (bp) (%) (%) lengtha aligned length b (bp) (bp) 1 43261740 31010925 71.7 34994029 80.9 2 35954743 25659870 71.4 29339813 81.6 3 36192742 28461838 78.6 28921566 79.9 4 35498469 24328866 68.5 27745686 78.2 5 29737217 22571349 75.9 23550481 79.2 6 30731886 21748471 70.8 24020916 78.2 7 29644043 18646802 62.9 23090121 77.9 8 28434780 20431436 71.9 22569219 79.4 9 22696651 15646883 68.9 17977251 79.2 10 22685906 15204475 67.0 16947181 74.7 11 28386948 17637283 62.1 19395129 68.3 12 27566993 16190624 58.7 21249334 77.1 Total 370792118 257538822 69.5 289800726 78.2 Gene models 37544 22376 59.6 26424 70.4 Supportedc 9485 6482 68.3 7139 75.3 a Aligned length of 50,231contigs of the O. sativassp. indica 93-11 assembly on the IRGSP pseudomolecules requiring matches of at least 80% of the length of the contig and at least 50% identity . b Aligned length of 35,047contigs of the Sy ngenta assemblyO.ofsativassp. japonica cv . Nipponbare on the IRGSP pseudomolecules requiring matches of at least 95% cov erage and 95% identity . c Gene models supported by cov erage of f ull-length cDNAs f or 90% of their length. Supplementary Table 20 Com paris on of BGI as s e m blie s w ith IRGSP chrom os omae 1S BGI 93-11 assembly 93 contigs Contigs w ith homology Duplicate contigs Mis-mapped Non-homologous Total contig length (bp) Average contig length (bp) Non-redundant coverage (bp) Overlap (bp) Mis-matches (bp) a 71 36 76.34% 50.70% 22 993,515 10,683 710,471 4,176 4,108 23.66% 82.31% 0.59% 0.58% BGI Syngenta assembly 70 contigs 59 6 11 84.29% 16.67% 15.71% 848,454 12,121 724,411 6,799 347 81.94% 0.94% 0.05% Comparison with the f irst 875,786 bp of chromosome 1 pseudomolecule f rom the telomere of chromosome 1S. Supplem entary T abl e 21Distribution A of CentO sequences in the BGI 93-11 assembly Array CentO Chromosome Subject Subject length copies assignment start end (bp) (No.) Chr01 1256168 1255350 818 5 Chr01 9043263 9050561 7298 47 Chr01 9186320 9183849 2471 16 Chr01a 18569982 18568711 1271 8 Chr01 18624116 18614322 9794 63 Chr01 20350374 20350920 546 4 Chr01 21844437 21846291 1854 12 Chr01 30295598 30297119 1521 10 Chr01 34802712 34806953 4241 27 Chr01 34809809 34808897 912 6 Chr01 38392847 38388111 4736 31 Chr01 40870249 40865995 4254 27 Chr01 40870331 40877675 7344 47 Chr01 40887738 40889385 1647 11 Chr02 4003051 4007367 4316 28 Chr02 14200941 14198813 2128 14 Chr02 14565278 14574523 9245 60 Chr02 14678314 14677370 944 6 Chr02 17796867 17798320 1453 9 Chr02 17809340 17809753 413 3 Chr02 19850232 19854195 3963 26 Chr02 35899130 35901303 2173 14 Chr03 281437 281027 410 3 Chr03 1377354 1382135 4781 31 Chr03 2702221 2696627 5594 36 Chr03 9382323 9384102 1779 11 Chr03 13940122 13941217 1095 7 Chr03 14834636 14830470 4166 27 Chr03 16656112 16660053 3941 25 Chr03 19407814 19409640 1826 12 Chr03 22015728 22017923 2195 14 Chr03 22098541 22093460 5081 33 Chr03 22130775 22130899 124 1 Chr03 22138764 22139081 317 2 Chr03 22153073 22154263 1190 8 Chr03 22157195 22158990 1795 12 Chr03 22167115 22167197 82 1 Chr03 24294585 24297587 3002 19 Chr03 26395800 26393583 2217 14 Chr03 27690907 27683051 7856 51 Chr03 36457961 36461010 3049 20 Chr03 36464095 36464673 578 4 Chr04 1144279 1141426 2853 18 Chr04 7500829 7500708 121 1 Chr04 7506385 7504551 1834 12 Chr04 7519837 7520932 1095 7 Chr04 7541739 7545495 3756 24 Chr04 10169269 10171379 2110 14 Chr04 10261384 Chr04 28468122 Chr04 31512669 Chr06 9860291 Chr06 11469301 Chr06 12146383 Chr06 12172750 Chr06 14338949 Chr06 15908546 Chr06 16124576 Chr06 16124724 Chr06 19586285 Chr06 19588513 Chr07 6549912 Chr07 11450228 Chr08 7439696 Chr08 12775077 Chr08 13532245 Chr08 14269758 Chr08 14333346 Chr08 14937002 Chr08 14937037 Chr08 21487250 Chr08 21822077 Chr08 27286179 Chr08 27311471 Chr08 27320917 Chr08 27323434 Chr08 27324908 Chr08 27325348 Chr08 28509724 Chr09 1315388 Chr09 2104886 Chr09 2613609 Chr09 2976137 Chr09 4978842 Chr09 15388053 Chr09 15396620 Chr10 4784639 Chr10 4789705 Chr10 6602523 Chr10 7464130 Chr10 13456190 Chr10 14969094 Chr11 10771594 Chr11 11946953 Chr12 9348020 Chr12 9352184 Chr12 9458337 Chr12 9458524 T otal T otal non-centromeric a 10262440 28466133 31514739 9860412 11465900 12148055 12173306 14337345 15918312 16118167 16127603 19586747 19593212 6550541 11452485 7443556 12773475 13534013 14271579 14330266 14936088 14937284 21489553 21822192 27286753 27307908 27315883 27323846 27325062 27326296 28509851 1312201 2107407 2610676 2975983 4978050 15390390 15401683 4787205 4790963 6603245 7463857 13448564 14970834 10773866 11948283 9344974 9350529 9458192 9459748 1056 1989 2070 121 3401 1672 556 1604 9766 6409 2879 462 4699 629 2257 3860 1602 1768 1821 3080 914 247 2303 115 574 3563 5034 412 154 948 127 3187 2521 2933 154 792 2337 5063 2566 1258 722 273 7626 1740 2272 1330 3046 1655 145 1224 243125 165815 Rows i n red are probabl e centromeric regi ons. 7 13 13 1 22 11 4 10 63 41 19 3 30 4 15 25 10 11 12 20 6 2 15 1 4 23 32 3 1 6 1 21 16 19 1 5 15 33 17 8 5 2 49 11 15 9 20 11 1 8 1569 1070 68.2% Supplementary T able 21B Distribution of CentO sequences in the BGI Syngenta assembly Chrom osom e Subject Subject Array CentO assignm ent start end length copi es (bp) (No.) Chr01a 16731017 16735672 4655 30 Chr01 16738393 16741278 2885 19 Chr01 16745887 16746623 736 5 Chr01 16780879 16790706 9827 63 Chr01 25211958 25210493 1465 9 Chr01 25216170 25213019 3151 20 Chr02 13056387 13055460 927 6 Chr02 13529745 13528220 1525 10 Chr03 252854 252444 410 3 Chr03 3906999 3907126 127 1 Chr03 17720708 17721002 294 2 Chr03 17722657 17728201 5544 36 Chr03 19959184 19961392 2208 14 Chr03 20036563 20036162 401 3 Chr03 20067645 20067770 125 1 Chr03 20099012 20098930 82 1 Chr03 20109917 20108122 1795 12 Chr03 20114786 20113598 1188 8 Chr03 23748094 23749467 1373 9 Chr03 23772663 23776063 3400 22 Chr04 7791498 7794806 3308 21 Chr04 7795363 7796695 1332 9 Chr04 7798649 7804989 6340 41 Chr04 7811886 7814379 2493 16 Chr04 7818720 7823580 4860 31 Chr05 12278501 12282262 3761 24 Chr06 8971387 8971508 121 1 Chr06 14640316 14642546 2230 14 Chr07 10815799 10817384 1585 10 Chr07 10911755 10911887 132 1 Chr07 10911920 10913263 1343 9 Chr07 10918808 10920391 1583 10 Chr07 11458963 11465777 6814 44 Chr07 23154880 23152140 2740 18 Chr08 9437713 9435791 1922 12 Chr08 9443242 9441814 1428 9 Chr08 12176063 12181643 5580 36 Chr08 12273671 12272392 1279 8 Chr08 12799984 12798723 1261 8 Chr08 13500044 13496155 3889 25 Chr08 13500150 13502433 2283 15 Chr08 19353176 19353399 223 1 Chr08 25583844 25583968 124 1 Chr09 1489178 1491035 1857 12 Chr09 2277924 2274987 2937 19 Chr09 2652812 2650745 2067 13 Chr09 2656497 2653667 2830 18 Chr09 2664137 2663077 1060 7 Chr09 Chr09 Chr09 Chr09 Chr09 Chr10 Chr10 Chr10 Chr10 Chr10 Chr11 Chr12 Chr12 Chr12 Chr12 Chr12 Chr12 Chr12 Chr12 Chr12 T otal Non-centromeric a 2676134 2695489 2726512 2729195 2735519 6046114 6049145 6247385 7254983 7979969 10903315 7386026 9447045 9463727 9491249 9509827 9552810 9563131 9578031 9578140 subtotal 2677518 2696408 2726207 2729551 2733547 6047077 6050552 6246088 7256528 7979696 10908209 7387229 9453431 9466354 9496539 9514860 9550532 9560252 9573203 9581952 1384 919 305 356 1972 963 1407 1297 1545 273 4894 1203 6386 2627 5290 5033 2278 2879 4828 3812 140321 137475 Rows in red are probabl e centromeri c regions. 9 6 2 2 13 6 9 8 10 2 32 8 41 17 34 32 15 19 31 25 905 293 32.4%
© Copyright 2024 ExpyDoc