latest PDF - Read the Docs

whatshap Documentation
Release 0.1.dev0
Murray Patterson, Alexander Schönhuth, Tobias Marschall, Marcel
January 20, 2015
Contents
1
Links
1.1 Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
i
ii
whatshap Documentation, Release 0.1.dev0
WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called haplotype assembly.
It is especially suitable for long reads, but works also well with short reads.
If you use WhatsHap, please cite:
Murray Patterson, Tobias Marschall, Nadia Pisanti, Leo van Iersel, Leen Stougie, Gunnar W. Klau,
Alexander Schönhuth. WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads. Proceedings of ACM 18th Annual International Conference on Research in Computational Biology (RECOMB), 237-249, 2014.
The version of WhatsHap you find here is the result of further development focused on making the software easy and
straightforward to use. WhatsHap is now Open Source software under the MIT license and we welcome contributions.
Note: WhatsHap is work in progress! In particular, the documentation is incomplete, not all features that we would
like to have for an initial release are there, and there are probably bugs.
Contents
1
whatshap Documentation, Release 0.1.dev0
2
Contents
CHAPTER 1
Links
• Bitbucket page
• Read the documentation online. Offline documentation is available in the doc/ subdirectory in the repository
and in the downloaded tar distribution.
1.1 Table of contents
1.1.1 Installation
Requirements
WhatsHap is implemented in C++ and Python. You need to have a C++ compiler, Python 3.2 (or later) and the
corresponding Python header files. In Ubuntu, make sure the packages build-essential and python3-dev
are installed.
Quickstart
As soon as there is a release, this should work:
pip3 install --user WhatsHap
Then add $HOME/.local/bin to your $PATH and run the tool:
export PATH=$HOME/.local/bin:$PATH
whatshap --help
Regular installation
There is currently no release of WhatsHap, so you need to install it from the Bitbucket repository instead. Make sure
you also have installed Cython:
pip3 install --user Cython
pip3 install --user https://bitbucket.org/whatshap/whatshap/get/master.tar.gz
This installs WhatsHap into $HOME/.local/bin. The Cython requirement will be dropped when there is a first
release.
3
whatshap Documentation, Release 0.1.dev0
You can also use a virtualenv instead, but you need to make sure that you have installed Cython into the virtualenv
before installing WhatsHap:
virtualenv -p python3 venv
venv/bin/pip3 install Cython
venv/bin/pip3 install https://bitbucket.org/whatshap/whatshap/get/master.tar.gz
If you get errors while installing Cython, try to add --install-option="--no-cython-compile" to the
command, see also issue 43.
Development installation
For development, make sure that you install Cython. We also recommend using a virtualenv. This sequence of
commands should work:
git clone https://bitbucket.org/whatshap/whatshap
cd whatshap
virtualenv -p python3 venv
venv/bin/pip3 install Cython
venv/bin/python3 setup.py develop
Then you can run WhatsHap like this:
venv/bin/whatshap --help
Development installation (alternative)
Alternatively, if you do not want to use virtualenv, you can do the following:
git clone https://bitbucket.org/whatshap/whatshap.git
cd whatshap
python3 setup.py build_ext -i --cython
bin/whatshap
This requires Cython, pysam, and pyvcf to be installed.
Installing other Python versions in Ubuntu
Ubuntu comes with one default Python 3 version, and in order to test WhatsHap with other Python versions (3.2, 3.3
and 3.4), use the “deadsnakes” repository. Ensure you have the following packages:
sudo apt-get install build-essential python-software-properties
Then get and install the desired Python versions. For example, for Python 3.2:
sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python3.2-dev python3-setuptools
If pip and virtualenv are not available, install them (Since they are so essential, we use sudo to install them systemwide, but you can also install them into your $HOME by omitting the sudo and adding the --user option instead):
sudo easy_install3 pip
sudo pip3 install virtualenv
4
Chapter 1. Links
whatshap Documentation, Release 0.1.dev0
1.1.2 User guide
Run WhatsHap like this:
python3 -m whatshap input.vcf input.bam > phased.vcf
Phasing information is added to the VCF file in a way that is compatible with GATK’s ReadBackedPhasing. That is,
the HP tag denotes which set of phased variants a variant belongs to.
Debugging
$ gdb python3 (gdb) run -m nose
After you get a SIGSEGV, let gdb print a backtrace:
(gdb) bt
1.1.3 Various notes
• There is a step in which variants are re-discovered in the BAM file. This may fail when the variant caller has
used some type of re-alignment (as freebayes does). Would be better to integrate this into the variant caller or
to get the information out of it. This applies only to indels, which are not supported right now anyway.
• Input format for HapCompass: http://www.brown.edu/Research/Istrail_Lab/resources/hapcompass_manual.html#sec11
1.1.4 File formats
Phasing in VCFs
• originally only via 0|1 and 1|0 etc per entry
• then a ‘phase set’ (PS) added to INFO field: entries with same PS are in same set of phased genotypes
GATK VCF phasing syntax
It adds these format tags:
##FORMAT=<ID=HP,Number=.,Type=String,Description="Read-backed phasing haplotype identifiers">
##FORMAT=<ID=PQ,Number=1,Type=Float,Description="Read-backed phasing quality">
Example (edited excerpt):
24
72
84
194
254
448
653
G
T
T
G
T
C
T
T
G
G
A
A
T
G
4399.41
4229.54
3027.84
259.80
1041.12
311.52
298.88
GT:AO:DP:GQ:HP:PL:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
GT:AO:DP:GQ:HP:PL:PQ:QA:QR:RO
0/1:136:181:99:
0/1:133:199:99:
0/1:93:181 :99:
0/1:10:49 :99:
0/1:31:55 :99:
0/1:12:58 :99:
0/1:9:26
:99:
24-1,24-2
24-1,24-2
24-1,24-2
24-1,24-2
24-2,24-1
24-1,24-2
24-2,24-1
:4413,0,1289:5040:1568:
:4244,0,1991: 35.77 :
:3042,0,2873: 98.44 :
:274,0,1205 : 31.77 :
:1055,0,838 : 31.60 :
:325,0,1501 : 37.13 :
:313,0,587 : 36.98 :
• PQ tag is not added for first variant.
• Indels are not phased
• Forum links: https://gatkforums.broadinstitute.org/discussion/4226/ https://gatkforums.broadinstitute.org/discussion/4038/
1.1. Table of contents
5