whatshap Documentation Release 0.1.dev0 Murray Patterson, Alexander Schönhuth, Tobias Marschall, Marcel January 20, 2015 Contents 1 Links 1.1 Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 i ii whatshap Documentation, Release 0.1.dev0 WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called haplotype assembly. It is especially suitable for long reads, but works also well with short reads. If you use WhatsHap, please cite: Murray Patterson, Tobias Marschall, Nadia Pisanti, Leo van Iersel, Leen Stougie, Gunnar W. Klau, Alexander Schönhuth. WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads. Proceedings of ACM 18th Annual International Conference on Research in Computational Biology (RECOMB), 237-249, 2014. The version of WhatsHap you find here is the result of further development focused on making the software easy and straightforward to use. WhatsHap is now Open Source software under the MIT license and we welcome contributions. Note: WhatsHap is work in progress! In particular, the documentation is incomplete, not all features that we would like to have for an initial release are there, and there are probably bugs. Contents 1 whatshap Documentation, Release 0.1.dev0 2 Contents CHAPTER 1 Links • Bitbucket page • Read the documentation online. Offline documentation is available in the doc/ subdirectory in the repository and in the downloaded tar distribution. 1.1 Table of contents 1.1.1 Installation Requirements WhatsHap is implemented in C++ and Python. You need to have a C++ compiler, Python 3.2 (or later) and the corresponding Python header files. In Ubuntu, make sure the packages build-essential and python3-dev are installed. Quickstart As soon as there is a release, this should work: pip3 install --user WhatsHap Then add $HOME/.local/bin to your $PATH and run the tool: export PATH=$HOME/.local/bin:$PATH whatshap --help Regular installation There is currently no release of WhatsHap, so you need to install it from the Bitbucket repository instead. Make sure you also have installed Cython: pip3 install --user Cython pip3 install --user https://bitbucket.org/whatshap/whatshap/get/master.tar.gz This installs WhatsHap into $HOME/.local/bin. The Cython requirement will be dropped when there is a first release. 3 whatshap Documentation, Release 0.1.dev0 You can also use a virtualenv instead, but you need to make sure that you have installed Cython into the virtualenv before installing WhatsHap: virtualenv -p python3 venv venv/bin/pip3 install Cython venv/bin/pip3 install https://bitbucket.org/whatshap/whatshap/get/master.tar.gz If you get errors while installing Cython, try to add --install-option="--no-cython-compile" to the command, see also issue 43. Development installation For development, make sure that you install Cython. We also recommend using a virtualenv. This sequence of commands should work: git clone https://bitbucket.org/whatshap/whatshap cd whatshap virtualenv -p python3 venv venv/bin/pip3 install Cython venv/bin/python3 setup.py develop Then you can run WhatsHap like this: venv/bin/whatshap --help Development installation (alternative) Alternatively, if you do not want to use virtualenv, you can do the following: git clone https://bitbucket.org/whatshap/whatshap.git cd whatshap python3 setup.py build_ext -i --cython bin/whatshap This requires Cython, pysam, and pyvcf to be installed. Installing other Python versions in Ubuntu Ubuntu comes with one default Python 3 version, and in order to test WhatsHap with other Python versions (3.2, 3.3 and 3.4), use the “deadsnakes” repository. Ensure you have the following packages: sudo apt-get install build-essential python-software-properties Then get and install the desired Python versions. For example, for Python 3.2: sudo add-apt-repository ppa:fkrull/deadsnakes sudo apt-get update sudo apt-get install python3.2-dev python3-setuptools If pip and virtualenv are not available, install them (Since they are so essential, we use sudo to install them systemwide, but you can also install them into your $HOME by omitting the sudo and adding the --user option instead): sudo easy_install3 pip sudo pip3 install virtualenv 4 Chapter 1. Links whatshap Documentation, Release 0.1.dev0 1.1.2 User guide Run WhatsHap like this: python3 -m whatshap input.vcf input.bam > phased.vcf Phasing information is added to the VCF file in a way that is compatible with GATK’s ReadBackedPhasing. That is, the HP tag denotes which set of phased variants a variant belongs to. Debugging $ gdb python3 (gdb) run -m nose After you get a SIGSEGV, let gdb print a backtrace: (gdb) bt 1.1.3 Various notes • There is a step in which variants are re-discovered in the BAM file. This may fail when the variant caller has used some type of re-alignment (as freebayes does). Would be better to integrate this into the variant caller or to get the information out of it. This applies only to indels, which are not supported right now anyway. • Input format for HapCompass: http://www.brown.edu/Research/Istrail_Lab/resources/hapcompass_manual.html#sec11 1.1.4 File formats Phasing in VCFs • originally only via 0|1 and 1|0 etc per entry • then a ‘phase set’ (PS) added to INFO field: entries with same PS are in same set of phased genotypes GATK VCF phasing syntax It adds these format tags: ##FORMAT=<ID=HP,Number=.,Type=String,Description="Read-backed phasing haplotype identifiers"> ##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality"> Example (edited excerpt): 24 72 84 194 254 448 653 G T T G T C T T G G A A T G 4399.41 4229.54 3027.84 259.80 1041.12 311.52 298.88 GT:AO:DP:GQ:HP:PL:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO 0/1:136:181:99: 0/1:133:199:99: 0/1:93:181 :99: 0/1:10:49 :99: 0/1:31:55 :99: 0/1:12:58 :99: 0/1:9:26 :99: 24-1,24-2 24-1,24-2 24-1,24-2 24-1,24-2 24-2,24-1 24-1,24-2 24-2,24-1 :4413,0,1289:5040:1568: :4244,0,1991: 35.77 : :3042,0,2873: 98.44 : :274,0,1205 : 31.77 : :1055,0,838 : 31.60 : :325,0,1501 : 37.13 : :313,0,587 : 36.98 : • PQ tag is not added for first variant. • Indels are not phased • Forum links: https://gatkforums.broadinstitute.org/discussion/4226/ https://gatkforums.broadinstitute.org/discussion/4038/ 1.1. Table of contents 5
© Copyright 2024 ExpyDoc