Biohen: Basic commands and an example of NGS Analysis

Biohen:
Basic commands and an example of
NGS Analysis
Erin Crowgey, VP
Bioinformatics Student Association
Basic Unix Commands
cd dirname
change current working directory
pwd
print working directory
ls -l
list files in long format
cp filename1 filename2
copy file
mv filename1 filename2
move file
rm filename
remove file
diff filename1 filename2
list differences between files
mkdir
make a new directory
gzip filename
compress file
gunzip
uncompress file
more filename
look at file
head filename
look at top of file
tail filename
look at bottom of file
which
identify location of executable
http://www.bioinformatics.babraham.ac.uk/training.html#unix
http://mally.stanford.edu/~sr/computing/basic-unix.html
http://www.math.utah.edu/lab/unix/unix-commands.html
2
Editing Text Documents
• Nano
– Text editor using a command line interface
– nano filename
3
Editing Text Documents
• VI
– Screen-oriented text editor originally created for
Unix operating system
– VI filename
4
Documentation BioHen
• List of software installed on BioHen Cluster
– http://bioinformatics.udel.edu/Core/BioHen-Software
• Brief description of BioHen computing cluster
– http://bioit.dbi.udel.edu/BIOHEN-Cluster.html
5
Software Biohen
http://bioinformatics.udel.edu/Core/BioHen-Software
6
PATH Variable
• Environment variable
• It is colon delimited list of directories that your
shell searchers through when you enter a
command
• In general each executing process or user session
has its own PATH setting
7
Biohen Computing Cluster
http://bioit.dbi.udel.edu/BIOHEN-Cluster.html
8
TORQUE Resource Manager
• Important control lines
#PBS –N name_of_job
#PBS –V ensures all your environment variables
from the head node are passed to the execution
node
#PBS –l provides a list of resources required for the
job (nodes:ppn:walltime)
9
TORQUE Resource Manager
• qsub name_of_script.sh
– Submits your job to Torque for execution
• qstat
– View the status of Torque jobs
10
Example NGS Analysis Biohen
11
NGS
(Next-Gen Sequencing)
Data Analysis
Analysis Support for
• RNA-Seq
• miRNA
• Resequencing: SNP/InDel
• De novo Genome Assembly
• Genome Structural Variation/
Copy Number Variation (CNV)
• ChIP-Seq
• Reduced Representation
• Amplicon Library (16S rRNA)
• Metagenome
• Metatranscriptome
12
Example Human NGS Analysis
• Goal is to detect SNPs within a disease
population
– Illumina
• pair-end 50 bp (insert size of 300bp)
• whole genome
13
Bioinformatics Analysis Workflow
Software BioHen
Fastqc, cutadapt
BWA-mem, bowtie2, tophat
GATK, Samtools, Pindel, BreakDancer
14
Fastqc
15
Fastqc
Example torque script fastqc
Example torque script fastqc and cutadapt
16
Reference Alignment
nano .bashrc (home directory)
17
Reference Alignment
Example torque script
qstat
18
Variant Detection
19
Questions?
20