Biohen: Basic commands and an example of NGS Analysis Erin Crowgey, VP Bioinformatics Student Association Basic Unix Commands cd dirname change current working directory pwd print working directory ls -l list files in long format cp filename1 filename2 copy file mv filename1 filename2 move file rm filename remove file diff filename1 filename2 list differences between files mkdir make a new directory gzip filename compress file gunzip uncompress file more filename look at file head filename look at top of file tail filename look at bottom of file which identify location of executable http://www.bioinformatics.babraham.ac.uk/training.html#unix http://mally.stanford.edu/~sr/computing/basic-unix.html http://www.math.utah.edu/lab/unix/unix-commands.html 2 Editing Text Documents • Nano – Text editor using a command line interface – nano filename 3 Editing Text Documents • VI – Screen-oriented text editor originally created for Unix operating system – VI filename 4 Documentation BioHen • List of software installed on BioHen Cluster – http://bioinformatics.udel.edu/Core/BioHen-Software • Brief description of BioHen computing cluster – http://bioit.dbi.udel.edu/BIOHEN-Cluster.html 5 Software Biohen http://bioinformatics.udel.edu/Core/BioHen-Software 6 PATH Variable • Environment variable • It is colon delimited list of directories that your shell searchers through when you enter a command • In general each executing process or user session has its own PATH setting 7 Biohen Computing Cluster http://bioit.dbi.udel.edu/BIOHEN-Cluster.html 8 TORQUE Resource Manager • Important control lines #PBS –N name_of_job #PBS –V ensures all your environment variables from the head node are passed to the execution node #PBS –l provides a list of resources required for the job (nodes:ppn:walltime) 9 TORQUE Resource Manager • qsub name_of_script.sh – Submits your job to Torque for execution • qstat – View the status of Torque jobs 10 Example NGS Analysis Biohen 11 NGS (Next-Gen Sequencing) Data Analysis Analysis Support for • RNA-Seq • miRNA • Resequencing: SNP/InDel • De novo Genome Assembly • Genome Structural Variation/ Copy Number Variation (CNV) • ChIP-Seq • Reduced Representation • Amplicon Library (16S rRNA) • Metagenome • Metatranscriptome 12 Example Human NGS Analysis • Goal is to detect SNPs within a disease population – Illumina • pair-end 50 bp (insert size of 300bp) • whole genome 13 Bioinformatics Analysis Workflow Software BioHen Fastqc, cutadapt BWA-mem, bowtie2, tophat GATK, Samtools, Pindel, BreakDancer 14 Fastqc 15 Fastqc Example torque script fastqc Example torque script fastqc and cutadapt 16 Reference Alignment nano .bashrc (home directory) 17 Reference Alignment Example torque script qstat 18 Variant Detection 19 Questions? 20
© Copyright 2024 ExpyDoc