R Bioinformatics Cookbook

Book description

Over 60 recipes to model and handle real-life biological data using modern libraries from the R ecosystem

Key Features

  • Apply modern R packages to handle biological data using real-world examples
  • Represent biological data with advanced visualizations suitable for research and publications
  • Handle real-world problems in bioinformatics such as next-generation sequencing, metagenomics, and automating analyses

Book Description

Handling biological data effectively requires an in-depth knowledge of machine learning techniques and computational skills, along with an understanding of how to use tools such as edgeR and DESeq. With the R Bioinformatics Cookbook, you’ll explore all this and more, tackling common and not-so-common challenges in the bioinformatics domain using real-world examples.

This book will use a recipe-based approach to show you how to perform practical research and analysis in computational biology with R. You will learn how to effectively analyze your data with the latest tools in Bioconductor, ggplot, and tidyverse. The book will guide you through the essential tools in Bioconductor to help you understand and carry out protocols in RNAseq, phylogenetics, genomics, and sequence analysis. As you progress, you will get up to speed with how machine learning techniques can be used in the bioinformatics domain. You will gradually develop key computational skills such as creating reusable workflows in R Markdown and packages for code reuse.

By the end of this book, you’ll have gained a solid understanding of the most important and widely used techniques in bioinformatic analysis and the tools you need to work with real biological data.

What you will learn

  • Employ Bioconductor to determine differential expressions in RNAseq data
  • Run SAMtools and develop pipelines to find single nucleotide polymorphisms (SNPs) and Indels
  • Use ggplot to create and annotate a range of visualizations
  • Query external databases with Ensembl to find functional genomics information
  • Execute large-scale multiple sequence alignment with DECIPHER to perform comparative genomics
  • Use d3.js and Plotly to create dynamic and interactive web graphics
  • Use k-nearest neighbors, support vector machines and random forests to find groups and classify data

Who this book is for

This book is for bioinformaticians, data analysts, researchers, and R developers who want to address intermediate-to-advanced biological and bioinformatics problems by learning through a recipe-based approach. Working knowledge of R programming language and basic knowledge of bioinformatics are prerequisites.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. R Bioinformatics Cookbook
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions usedpacktpub.com/.../9781789950694_ColorImages.pdf
    4. Sections
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Get in touch
      1. Reviews
  6. Performing Quantitative RNAseq
    1. Technical requirements
    2. Estimating differential expression with edgeR
      1. Getting ready
      2. How to do it...
        1. Using edgeR from a count table
        2. Using edgeR from an ExpressionSet object
      3. How it works...
        1. Using edgeR from a count table
        2. Using edgeR from an ExpressionSet object
    3. Estimating differential expression with DESeq2
      1. Getting ready
      2. How to do it...
        1. Using DESeq2 from a count matrix
        2. Using DESeq2 from an ExpressionSet object
      3. How it works...
        1. Using DESeq2 from a count matrix
        2. Using DESeq2 from an ExpressionSet object
    4. Power analysis with powsimR
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Finding unannotated transcribed regions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Finding regions showing high expression ab initio with bumphunter
      1. Getting ready...
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Differential peak analysis
      1. Getting ready
      2. How to do it...
      3. How it works...
    8. Estimating batch effects using SVA
      1. Getting ready
      2. How to do it...
      3. How it works...
    9. Finding allele-specific expressions with AllelicImbalance
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    10. Plotting and presenting RNAseq data
      1. Getting ready
      2. How to do it...
      3. How it works...
  7. Finding Genetic Variants with HTS Data
    1. Technical requirements
    2. Finding SNPs and indels from sequence data using VariantTools
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Predicting open reading frames in long reference sequences
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Plotting features on genetic maps with karyoploteR
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Selecting and classifying variants with VariantAnnotation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Extracting information in genomic regions of interest
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Finding phenotype and genotype associations with GWAS
      1. Getting ready
      2. How to do it...
      3. How it works...
    8. Estimating the copy number at a locus of interest
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
  8. Searching Genes and Proteins for Domains and Motifs
    1. Technical requirements
    2. Finding DNA motifs with universalmotif
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Finding protein domains with PFAM and bio3d
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Finding InterPro domains
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also...
    5. Performing multiple alignments of genes or proteins
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Aligning genomic length sequences with DECIPHER
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Machine learning for novel feature detection in proteins
      1. Getting ready
      2. How to do it...
      3. How it works...
    8. 3D structure protein alignment with bio3d
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's More...
  9. Phylogenetic Analysis and Visualization
    1. Technical requirements
    2. Reading and writing varied tree formats with ape and treeio
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    3. Visualizing trees of many genes quickly with ggtree
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Quantifying differences between trees with treespace
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Extracting and working with subtrees using ape
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Creating dot plots for alignment visualization
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Reconstructing trees from alignments using phangorn
      1. Getting ready
      2. How to do it...
      3. How it works...
  10. Metagenomics
    1. Technical requirements
    2. Loading in hierarchical taxonomic data using phyloseq
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Rarefying counts and correcting for sample differences using metacoder
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Reading amplicon data from raw reads with dada2
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    5. Visualizing taxonomic abundances with heat trees in metacoder
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Computing sample diversity with vegan
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also...
    7. Splitting sequence files into OTUs
      1. Getting ready
      2. How to do it...
      3. How it works...
  11. Proteomics from Spectrum to Annotation
    1. Technical requirements
    2. Representing raw MS data visually
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Viewing proteomics data in a genome browser
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Visualizing distributions of peptide hit counts to find thresholds
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Converting MS formats to move data between tools
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Matching spectra to peptides for verification with protViz
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Applying quality control filters to spectra
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Identifying genomic loci that match peptides
      1. Getting ready
      2. How to do it...
      3. How it works...
  12. Producing Publication and Web-Ready Visualizations
    1. Technical requirements
    2. Visualizing multiple distributions with ridgeplots
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Creating colormaps for two-variable data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    4. Representing relational data as networks
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Creating interactive web graphics with plotly
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Constructing three-dimensional plots with plotly
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Constructing circular genome plots of polyomic data
      1. Getting ready
      2. How to do it...
      3. How it works...
  13. Working with Databases and Remote Data Sources
    1. Technical requirements
    2. Retrieving gene and genome annotation from BioMart
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Retrieving and working with SNPs
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Getting gene ontology information
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Finding experiments and reads from SRA/ENA
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Performing quality control and filtering on high-throughput sequence reads
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Completing read-to-reference alignment with external programs
      1. Getting ready...
      2. How to do it...
      3. How it works...
    8. Visualizing the quality control of read-to-reference alignments
      1. Getting ready...
      2. How to do it...
      3. How it works...
  14. Useful Statistical and Machine Learning Methods
    1. Technical requirements
    2. Correcting p-values to account for multiple hypotheses
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Generating a simulated dataset to represent a background
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Learning groupings within data and classifying with kNN
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Predicting classes with random forests
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more
    6. Predicting classes with SVM
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Learning groups in data without prior information
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more
    8. Identifying the most important variables in data with random forests
      1. Getting ready
      2. How to do it...
      3. How it works...
    9. Identifying the most important variables in data with PCA
      1. Getting ready
      2. How to do it...
      3. How it works...
  15. Programming with Tidyverse and Bioconductor
    1. Technical requirements
    2. Making base R objects tidy
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Using nested dataframes
      1. Getting ready
      2. How it works...
      3. How it works...
      4. There's more...
    4. Writing functions for use in dplyr::mutate()
      1. Getting ready
      2. How to do it...
      3. How it works...
    5. Working programmatically with Bioconductor classes
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Developing reusable workflows and reports
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Making use of the apply family of functions
      1. Getting ready
      2. How to do it...
      3. How it works...
  16. Building Objects and Packages for Code Reuse
    1. Technical requirements
    2. Creating simple S3 objects to simplify code
      1. Getting ready
      2. How to do it...
      3. How it works...
    3. Taking advantage of generic object functions with S3 classes
      1. Getting ready
      2. How to do it...
      3. How it works...
    4. Creating structured and formal objects with the S4 system
      1. Getting ready
      2. How to do it...
      3. How it works
      4. See also
    5. Simple ways to package code for sharing and reuse
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Using devtools to host code from GitHub
      1. Getting ready
      2. How to do it...
      3. How it works...
    7. Building a unit test suite to ensure that functions work as you intend
      1. Getting ready
      2. How to do it...
      3. How it works...
    8. Using continuous integration with Travis to keep code tested and up to date
      1. Getting ready
      2. How to do it...
      3. How it works...
  17. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: R Bioinformatics Cookbook
  • Author(s): Dan MacLean
  • Release date: October 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789950694