Next genera*on sequencing data and analysis Mapping reads to a reference 09/09/2014 www.qub.ac.uk/igfs 1 Mapping of reads to a reference sequence • The raw data – Quality assessment of the raw data – Trimming the raw data • Quick overview of some alignment tools – MAQ – BWA • BWA-‐backtrack, BWA-‐Mem, BWA-‐SW – BOWTIE – BOWTIE2 • SAM/BAM output – Alignment Informa*on • MapQ, CIGAR, Tlength – Op*onal Tags • NM, MD, etc 09/09/2014 www.qub.ac.uk/igfs 2 The raw data: fastq file Reads.fastq files are very large files Each read has 4 rows with ~10-‐50 million reads in one fastq file • Sequence Header • Sequence • Quality Header • Quali*es 09/09/2014 www.qub.ac.uk/igfs 3 Typically: ~10-‐50 million reads in one fastq file @HWI-‐EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTTCTGACAACTTATAGTT + @=?DCDAGBBAA=??<<8CC?EBBEEEEBBBB???<<<<@@BEBEE=89B8BBBEEE;;=6 @HWI-‐EAS209_0006_FC706VJ:5:58:4694:21321#ATCACG/1 CTATGGCGTAGTAAATAAATCTCCTAATAGCTTAGATATTACCTTCAATAGCTTAGTC + BAA=??<<=??<<8CC?EBBEEEDCDAGBBAA=??<<8CCEBBBB??F<<<<@@BEBEE8 @HWI-‐EAS209_0006_FC706VJ:5:58:3455:21453#ATCACG/1 ATAGCTTGTAGTAAATAAATCTCCATAGCTTTTAGATATTACCTTCAATAGCTTAGTC + 09/09/2014 www.qub.ac.uk/igfs 4 Quali*es in fastq file Phred Quality scores Phred quality score Probability of incorrect call Base call accuracy 10 1 in 10 90% 20 1 in 100 99% 30 1 in 1000 99.9% 40 1 in 10000 99.99% Encoding (Illumina 1.8+ format): !"#$%&'()*+,-‐.0123456789:;<=>?@ABCDEFGHIJ 0-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐40 ..……………………………………………………….............99.99% accuracy 09/09/2014 www.qub.ac.uk/igfs 5 Quality Control: FastQC • Quality assessment of Next-‐genera*on sequencing data • Genera*on of mul*ple QC plots • Gives a quick impression of data quality hqp://www.bioinforma*cs.babraham.ac.uk/projects/fastqc/ www.qub.ac.uk/igfs 6 FastQC for good Illumina run 09/09/2014 www.qub.ac.uk/igfs 7 FastQC for bad Illumina run 09/09/2014 www.qub.ac.uk/igfs 8 Processing of reads with the fastx-‐toolkit • FASTQ Clipper – Removing sequencing adapters / linkers from reads • FASTQ Quality Filter – Filters (removes) sequences based on quality • FASTQ Quality – Trims (cuts) sequences based on quality • etc……. See hqp://hannonlab.cshl.edu/fastx_toolkit/commandline.html 09/09/2014 www.qub.ac.uk/igfs 9 Trimming of paired reads • Fastq-‐mcf – Detect & remove sequencing adapters and primers – Detect poor quality at the ends of reads and clip – Detect Ns, and remove from ends – Discard sequences that are too short ater all of the above – Keep mul*ple mate-‐reads in sync while doing all of the above – ………. – See hqps://code.google.com/p/ea-‐u*ls/wiki/ FastqMcf 09/09/2014 www.qub.ac.uk/igfs 10 Post-‐ QC analysis • De novo assembly (no reference available) • Mapping reads to a reference 09/09/2014 www.qub.ac.uk/igfs 11 Alignment methods Whole genome alignment BWA-‐mem BWA-‐SW BWA-‐backtrack Bowbe Bowbe2 Short read mapping Maq Short pairwise alignment Database search Adapted from: Chaisson and Tesler BMC Bioinforma*cs 2012, 13:238 09/09/2014 www.qub.ac.uk/igfs 12 NGS data: Mapping millions of short reads to a reference sequence (genome/transcriptome) • Examples of mapping tools • Maq (2007, ungapped alignment) • BWA-‐backtrack (2009, gapped alignment) • Bow*e (2009, ungapped alignment) • BWA-‐SW (2010, gapped alignment) • BWA-‐MEM (2013, gapped alignment) • Bow*e2 (2012, gapped alignment) 09/09/2014 www.qub.ac.uk/igfs 13 Running mapping, see documenta*on bwa index ref.fa bwa mem ref.fa read1.fq read2.fq > file.sam maq.pl easyrun –p ref.fa read1.fq read2.fq maq2sam-‐long all.map > all.sam bow*e2-‐build [op*ons] ref.fa bt2_base bow*e2 [op*ons] –x reference -‐1 1.fastq -‐2 2.fastq –S file.sam bow*e-‐build [op*ons] ref.fa bt_base bow*e [op*ons] –S –x reference -‐1 1.fastq -‐2 2.fastq > file.sam 09/09/2014 www.qub.ac.uk/igfs 14 Mapping billions of short reads to a genome Spaced-‐seed indexing Burrows-‐Wheeler transform From Trapnell & Saltzberg Nat Biotechnol. 2009 May; 27(5):455-‐7 09/09/2014 www.qub.ac.uk/igfs 15 Mapping process • Create an index of the reference sequence • Map the reads to the reference sequence – Inexact match – Quality awareness – Global end-‐to-‐end or local alignment – ungapped, gapped alignment • Output usually in form of sam/bam file – Records the alignment of each read – Mapping Quali*es – Matches and Miss-‐matches 09/09/2014 www.qub.ac.uk/igfs 16 BWA-‐backtrack, BWA-‐Mem, BWA-‐SW: • BWA, which algorithm should I choose? – BWA-‐backtrack • beqer for shorter sequences • gapped alignment • designed for sequencing error rates <2% – BWA-‐MEM • For >70bp Illumina, 454, Ion Proton • gapped alignment • tolerates more errors – BWA-‐SW • For longer sequences when alignment gaps are likely to be frequent • gapped alignment • tolerates more errors See hqp://bio-‐bwa.sourceforge.net/ 09/09/2014 www.qub.ac.uk/igfs 17 Bow*e, Bow*e2 • Bow*e – For short reads <50bp Bow*e may be faster and more sensi*ve, performs ungapped alignment • Bow*e2 – For reads >50bp Bow*e2 is usually faster and more accurate, performs gapped alignment and allows mul*ple gaps See hqp://bow*e-‐bio.sourceforge.net/index.shtml 09/09/2014 www.qub.ac.uk/igfs 18 Problems the aligner has to deal with • Sequence will not be exactly the same as the reference sequence, therefore have to allow inexact mapping and find the best match – Sequencing errors – Real differences • SNPs, Indels, structural rearrangements • Diploid, tetraploid etc • Homozygous, Heterozygous • Longer reads are more likely to have more miss-‐matches Choice of mapping approach and choice of downstream analysis 09/09/2014 www.qub.ac.uk/igfs 19 The Result: The Alignment file • The results obtained from an alignment tool – Alignment file – Usually in for of SAM/BAM format • Can be viewed in a genome viewer – Integra*ve Genomics Viewer (IGV) 09/09/2014 www.qub.ac.uk/igfs 20 SAM file (here shown without headers) 09/09/2014 www.qub.ac.uk/igfs 21 SAM/BAM files viewed in IGV MAQ Bow*e BWA-‐Mem Bow*e2 09/09/2014 www.qub.ac.uk/igfs 22 MAQ Bow*e BWA-‐Mem Bow*e2 09/09/2014 www.qub.ac.uk/igfs 23 MAQ Bow*e BWA-‐Mem Bow*e2 09/09/2014 www.qub.ac.uk/igfs 24 SAM file Read name Alignment Info Sequence of read Quality of each base Op*onal Tags Read name, sequence, qualibes for each read are the same as in the fastq file Alignment info and op*onal tags describe how the read aligns to the reference 09/09/2014 www.qub.ac.uk/igfs 25 Read name, sequence read, quality of each base HWI-‐ST863:143:D157WACXX:7:1101:15088:3801 CGGCTTTTCGGCTGGCTGCTGGAGGAGCTTGGCGCAGATGGCC JIJJJJIIJJJFIJJIGGGGGCCEHBEFDACCCDBBD:ACCCDCCCDC;@@ 09/09/2014 www.qub.ac.uk/igfs 26 SAM file (Bow*e2) Read name Alignment Info Sequence of read Quality of each base Op*onal Tags 1.4 The alignment section: mandatory fields In the SAM format, each alignment line typically represents the linear alignment of a segment. Each line has 11 mandatory fields. These fields always appear in the same order and must be present, but their values can be ‘0’ or ‘*’ (depending on the field) if the corresponding information is unavailable. The following table gives an overview of the mandatory fields in the SAM format: Col 1 2 3 4 5 6 7 8 9 10 11 Field QNAME FLAG RNAME POS MAPQ CIGAR RNEXT PNEXT TLEN SEQ QUAL Type String Int String Int Int String String Int Int String String Regexp/Range [!-?A-~]{1,255} [0,216 -1] \*|[!-()+-<>-~][!-~]* [0,231 -1] [0,28 -1] \*|([0-9]+[MIDNSHPX=])+ \*|=|[!-()+-<>-~][!-~]* [0,231 -1] [-231 +1,231 -1] \*|[A-Za-z=.]+ [!-~]+ Brief description Query template NAME bitwise FLAG Reference sequence NAME 1-based leftmost mapping POSition MAPping Quality CIGAR string Ref. name of the mate/next read Position of the mate/next read observed Template LENgth segment SEQuence ASCII of Phred-scaled base QUALity+33 1. QNAME: Query template NAME. Reads/segments having identical QNAME are regarded to come from the same template. A QNAME ‘*’ indicates the information is unavailable. In a SAM file, a read may occupy multiple alignment lines, when its alignment is chimeric or when multiple mappings are given. 2. FLAG: bitwise FLAG. Each bit is explained in the following table: 09/09/2014 Bit 0x1 0x2 0x4 0x8 0x10 0x20 0x40 0x80 0x100 0x200 0x400 0x800 Description template having multiple segments in sequencing each segment properly aligned according to the aligner segment unmapped next segment in the template unmapped SEQ being reverse complemented SEQ of the next segment in the template being reversed the first segment in the template the last segment in the template secondary alignment not passing quality controls PCR or optical duplicate supplementary alignment www.qub.ac.uk/igfs 27 SAM format specifica*on reference www.qub.ac.uk/igfs 28 Sam file alignment info Flag RefName BWA 99 iso*g11038 147 iso*g11038 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe2 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe 99 iso*g11038 147 iso*g11038 163 iso*g11038 83 iso*g11038 09/09/2014 Pos MaqQ Cigar 1187 1189 1602 1602 909 782 60 60 9 9 60 60 1187 1189 909 782 1187 1189 782 909 Rnext Pnext Tleng 80M 80M 53S20M7S 22S20M38S 80M 80M = = = = = = 1189 1187 1602 1602 782 909 82 -‐82 20 -‐20 -‐207 207 42 42 42 42 80M 80M 80M 80M = = = = 1189 1187 782 909 82 -‐82 -‐207 207 255 255 255 255 80M 80M 80M 80M = = = = 1189 1187 909 782 82 -‐82 207 -‐207 www.qub.ac.uk/igfs 29 Sam file alignment info Flag RefName BWA 99 iso*g11038 147 iso*g11038 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe2 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe 99 iso*g11038 147 iso*g11038 163 iso*g11038 83 iso*g11038 09/09/2014 Pos MaqQ Cigar 1187 1189 1602 1602 909 782 60 60 9 9 60 60 1187 1189 909 782 1187 1189 782 909 Rnext Pnext Tleng 80M 80M 53S20M7S 22S20M38S 80M 80M = = = = = = 1189 1187 1602 1602 782 909 82 -‐82 20 -‐20 -‐207 207 42 42 42 42 80M 80M 80M 80M = = = = 1189 1187 782 909 82 -‐82 -‐207 207 255 255 255 255 80M 80M 80M 80M = = = = 1189 1187 909 782 82 -‐82 207 -‐207 www.qub.ac.uk/igfs 30 MAPQ • MAPQ: MAPping Quality. – It equals −10 log10 Probability{mapping posi*on is wrong}, rounded to the nearest integer. – A value 255 indicates that the mapping quality is not available. -‐10*Log10(0.1)=10 i.e MAPQ of 10 = 10% probability that the mapping posi*on is wrong -‐10*log10(0.01)= MAPQ of 20 = 1% 1 in 100 -‐10*log10(0.001)= MAPQ of 30 = 0.1% 1 in 1,000 -‐10*log10(0.0001)= MAPQ of 40 = 0.01% 1 in 10,000 -‐10*log10(0.00001)= MAPQ of 50 = 0.001% 1 in 100,000 -‐10*log10(0.000001)= MAPQ of 60 = 0.0001% 1 in 1,000,000 09/09/2014 www.qub.ac.uk/igfs 31 Sam file alignment info Flag RefName Pos MaqQ Cigar Rnext Pnext BWA 99 iso*g11038 1187 60 80M = 1189 147 iso*g11038 1189 60 80M = 1187 99 iso*g11038 1602 9 53S20M7S = 1602 147 iso*g11038 1602 9 22S20M38S = 1602 83 iso*g11038 909 60 80M = 782 163 iso*g11038 782 60 80M = 909 Bowbe2 99 iso*g11038 1187 42 80M = 1189 147 iso*g11038 1189 42 80M = 1187 83 iso*g11038 909 42 80M = 782 163 iso*g11038 782 42 80M = 909 Bowbe Useful link: u*lity that e 2xplains AM flags in plain 99 iso*g11038 1187 55 8S0M = English 1189 147 iso*g11038 1189 255 80M = 1187 hqp://picard.sourceforge.net/explain-‐flags.html 163 iso*g11038 782 255 80M = 909 83 iso*g11038 909 255 80M = 782 09/09/2014 www.qub.ac.uk/igfs Tleng 82 -‐82 20 -‐20 -‐207 207 82 -‐82 -‐207 207 82 -‐82 207 -‐207 32 Sam file: bit-‐wise flags • Explana*on: 1 read paired Bit wise flag: 2 read mapped in proper pair 1=true 4 read unmapped 0=false 8 mate unmapped 16 read reverse strand converted into a decimal 32 mate reverse strand 64 first in pair 128 second in pair 1100011=64+32+2+1=99 256 not primary alignment 512 read fails plaƒorm quality checks 1024 read is PCR or op*cal duplicate Flags can be used for filtering 09/09/2014 www.qub.ac.uk/igfs 33 SAM file: bit-‐wise flags • Explana*on: 1 read paired 2 read mapped in proper pair 4 read unmapped 8 mate unmapped 16 read reverse strand 32 mate reverse strand 64 first in pair 128 second in pair 256 not primary alignment 512 read fails plaƒorm quality checks 1024 read is PCR or op*cal duplicate Bit wise flag: 1=true 0=false 10010011=147 Flags can be used for filtering 09/09/2014 www.qub.ac.uk/igfs 34 SAM file: bit-‐wise flags • 10010011=147 • 1100011=99 – read paired – read mapped in proper pair – mate reverse strand – first in pair 09/09/2014 www.qub.ac.uk/igfs – read paired – read mapped in proper pair – read reverse strand – second in pair 35 SAM file: bit-‐wise flags • Explana*on: 1 read paired 2 read mapped in proper pair 4 read unmapped 8 mate unmapped 16 read reverse strand 32 mate reverse strand 64 first in pair 128 second in pair 256 not primary alignment 512 read fails plaƒorm quality checks 1024 read is PCR or op*cal duplicate What does this flag mean 83? 163? Flags can be used for filtering 09/09/2014 www.qub.ac.uk/igfs 36 SAM file: bit-‐wise flags • Explana*on: 1 read paired 2 read mapped in proper pair 4 read unmapped 8 mate unmapped 16 read reverse strand 32 mate reverse strand 64 first in pair 128 second in pair 256 not primary alignment 512 read fails plaƒorm quality checks 1024 read is PCR or op*cal duplicate 1010011=83 Flags can be used for filtering 09/09/2014 www.qub.ac.uk/igfs 37 SAM file: bit-‐wise flags • Explana*on: 1 read paired 10100011=163 2 read mapped in proper pair 4 read unmapped 8 mate unmapped 16 read reverse strand 32 mate reverse strand 64 first in pair 128 second in pair 256 not primary alignment 512 read fails plaƒorm quality checks 1024 read is PCR or op*cal duplicate Flags can be used for filtering 09/09/2014 www.qub.ac.uk/igfs 38 SAM file: bit-‐wise flags • 10100011=163 • 1010011=83 – read paired – read mapped in proper pair – read reverse strand – first in pair 09/09/2014 www.qub.ac.uk/igfs – read paired – read mapped in proper pair – mate reverse strand – second in pair 39 Sam file alignment info Flag RefName BWA 99 iso*g11038 147 iso*g11038 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe2 99 iso*g11038 147 iso*g11038 83 iso*g11038 163 iso*g11038 Bowbe 99 iso*g11038 147 iso*g11038 163 iso*g11038 83 iso*g11038 09/09/2014 Pos MaqQ Cigar 1187 1189 1602 1602 909 782 60 60 9 9 60 60 1187 1189 909 782 1187 1189 782 909 Rnext Pnext Tleng 80M 80M 53S20M7S 22S20M38S 80M 80M = = = = = = 1189 1187 1602 1602 782 909 82 -‐82 20 -‐20 -‐207 207 42 42 42 42 80M 80M 80M 80M = = = = 1189 1187 782 909 82 -‐82 -‐207 207 255 255 255 255 80M 80M 80M 80M = = = = 1189 1187 909 782 82 -‐82 207 -‐207 www.qub.ac.uk/igfs 40 SAM file: Cigar string M= Alignment match (sequence match or mismatch) I= Inser*on to reference D= dele*on from reference S= clipped alignment (sotclipped) H= clipped alignment (hard clipped) N= long skip on the reference sequence P= silent dele*on from padded reference 09/09/2014 www.qub.ac.uk/igfs 41 The Cigar String 25M 18M1I7M 3S8M1D6M4S 9M14N8M What do these mean? 09/09/2014 M= Alignment match (sequence match or mismatch) I= Inser*on to reference D= dele*on from reference S= clipped alignment (sotclipped) N= long skip on the reference sequence www.qub.ac.uk/igfs 42 The Cigar String 25M REF: TGCATTCATGTGAATGTGAATGTAATATGGTGATCGCAC Read: ATGCGAATGTGATTGTAATATGGTG 18M1I7M REF: TGCATTCATGTGAATGTGAATGTAA*TATGGTGATCGCAC Read: ATGCGAATGTGATTGTAAATATGGTG 3S8M1D6M4S REF: AGCTAGCATCGTGTCGCCCGTCTAGCATACGCATGATCGAC Read: gggGTGTAGCC-‐GACTAGgggg 9M14N8M REF: TCGTGTCGCCCGTCTAGCATACGCATGATCGACTGTCAGCTA READ: GTGTAACCC..............................TGAGCGCC 09/09/2014 www.qub.ac.uk/igfs 43 Read name, alignment info, sequence, quali*es, tags HWI-‐ST863:143:D157WACXX:7:1101:15088:3801 99 isobg11038 1187 42 80M = 1189 82 CGGCTTTTCGGCTGGCTGCTGGAGGAGCTTGGCGCAGATGG JIJJJJIIJJJFIJJIGGGGGCCEHBEFDACCCDBBD:ACCCDCCCDC; AS:i:-‐6 NM:i:1 MD:Z:10C69 YS:i:-‐5 YT:Z:CP 09/09/2014 www.qub.ac.uk/igfs 44 SAM file, Op*onal Tags BWA NM:i:1 MD:Z:10C69 NM:i:1 MD:Z:8C71 NM:i:0 MD:Z:20 NM:i:0 MD:Z:20 NM:i:2 MD:Z:8A33G37 NM:i:0 MD:Z:80 Bowbe2 AS:i:-‐6 XN:i:0 XM:i:1 AS:i:-‐5 XN:i:0 XM:i:1 AS:i:-‐11 XN:i:0 XM:i:2 AS:i:0 XN:i:0 XM:i:0 Bowbe XA:i:1 MD:Z:10C69 XA:i:0 MD:Z:8C71 XA:i:0 MD:Z:80 XA:i:0 MD:Z:8A33G37 09/09/2014 AS:i:75 AS:i:75 AS:i:20 AS:i:20 AS:i:70 AS:i:80 XS:i:0 XS:i:0 XS:i:19 XA:Z:iso*g08401,-‐365,2S19M59S,0; XS:i:19 XA:Z:iso*g08401,+365,33S19M28S,0; XS:i:0 XS:i:0 XO:i:0 XO:i:0 XO:i:0 XO:i:0 XG:i:0 XG:i:0 XG:i:0 XG:i:0 NM:i:1 NM:i:1 NM:i:0 NM:i:2 NM:i:1 NM:i:1 NM:i:2 NM:i:0 www.qub.ac.uk/igfs MD:Z:10C69 MD:Z:8C71 MD:Z:8A33G37 MD:Z:80 YS:i:-‐5 YS:i:-‐6 YS:i:0 YS:i:-‐11 YT:Z:CP YT:Z:CP YT:Z:CP YT:Z:CP 45 SAM file op*onal Tags 09/09/2014 TAG TYPE Descripbon NM i=integer edit distance to reference MD Z=string string for miss-‐natching posi*ons AS i=integer score generated by aligner YS i=integer score of other mate YT Z=string string represen*ng alignment type X? ? reserved fields for end-‐users www.qub.ac.uk/igfs 46 Sam file Tags • Op*onal fields are in the format: <TAG>:<TYPE>:<VALUE> • Examples: – NM:i:1 • NM tag means Edit distance to reference • i stands for integer • 1 = one mismatch – MD:Z:12A12 09/09/2014 • MD tag means String for mismatching posi*ons • Z stands for printable String • 12A12 shows 12 matches, A, 12 matches www.qub.ac.uk/igfs 47 NM and MD Flags: Bwa-‐mem alignment file CIGAR NM-‐tag MD-‐tag 25M NM:i:1 MD:12A12 REF: TGCATTCATGTGAATGTGAATGTAATATGGTGATCGCAC Read: ATGCGAATGTGATTGTAATATGGTG What do these mean? CIGAR NM-‐tag MD-‐tag • 80M NM:i:1 MD:Z:10C69 • 80M NM:i:1 MD:Z:8C71 • 80M NM:i:2 MD:Z:8A33G37 • 80M NM:i:0 MD:Z:80 • 80M NM:i:2 MD:Z:24T44A10 09/09/2014 www.qub.ac.uk/igfs 48 SAM file: • • • • • MAPQ (mapping quality, unique mapping) CIGAR (match, inser*on, dele*on) Op*onal Fields, Tags (various) Sequence string (of the read) Base-‐quality (of each base in the read) Process with downstream tools to get desired output 09/09/2014 www.qub.ac.uk/igfs 49 The SAM/BAM alignment file • The sam/bam file may be processed to iden*fy – Counts per gene/read-‐depth (RNAseq, CHIPseq) • Gene expression analysis • transcrip*on factor binding sites – polymorphisms (SNPs, Indels etc) – Annota*on file (gƒ/gff) for annota*on of results 09/09/2014 www.qub.ac.uk/igfs 50 Processing tools for sam/bam alignment file • Samtools/Picard/GATK Provide sets of tools for working with NextGen data in bam format i.e deduplica*on, re-‐alignment around indels • Bedtools: Comparing genomic features in bed format • Perl/Python scripts etc for more custom made explora*on of the data • Downstream sta*s*cal analysis in R etc. 09/09/2014 51 REFERENCES Mapping short reads onto genomes • Trapnell C, Salzberg SL. How to map billions of short reads onto genomes. Nat Biotechnol. 2009 May;27(5):455-‐7. Bowbe and Bowbe2 • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-‐efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. • Langmead B, Salzberg SL. Fast gapped-‐read alignment with Bow*e 2. Nat Methods. 2012 Mar 4;9(4):357-‐9. BWA • Li H, Durbin R. Fast and accurate short read alignment with Burrows-‐Wheeler transform. Bioinforma*cs. 2009 Jul 15;25(14):1754-‐60. • Li H, Durbin R. Fast and accurate long-‐read alignment with Burrows-‐Wheeler transform. Bioinforma*cs. 2010 Mar 1;26(5):589-‐95 • Li H. Aligning sequence reads, clone sequences and assembly con*gs with BWA-‐MEM. 2013 arXiv:1303.3997v1 [q-‐bio.GN] Sam alignment format • Sam alignment format specifica*ons hqp://samtools.sourceforge.net/SAMv1.pdf 09/09/2014 www.qub.ac.uk/igfs 52
© Copyright 2024 ExpyDoc