Mastering Perl for Bioinformatics

Book description

Historically, programming hasn't been considered a critical skill for biologists. But now, with access to vast amounts of biological data contained in public databases, programming skills are increasingly in strong demand in biology research and development. Perl, with its highly developed capacities in string handling, text processing, networking, and rapid prototyping, has emerged as the programming language of choice for biological data analysis.Mastering Perl for Bioinformatics covers the core Perl language and many of its module extensions, presenting them in the context of biological data and problems of pressing interest to the biological community. This book, along with Beginning Perl for Bioinformatics, forms a basic course in Perl programming. This second volume finishes the basic Perl tutorial material (references, complex data structures, object-oriented programming, use of modules--all presented in a biological context) and presents some advanced topics of considerable interest in bioinformatics.The range of topics covered in Mastering Perl for Bioinformatics prepares the reader for enduring and emerging developments in critical areas of bioinformatics programming such as:

  • Gene finding
  • String alignment
  • Methods of data storage and retrieval (SML and databases)
  • Modeling of networks (graphs and Petri nets)
  • Graphics (Tk)
  • Parallelization
  • Interfacing with other programming languages
  • Statistics (PDL)
  • Protein structure determination
  • Biological models of computation (DNA Computers)
Biologists and computer scientists who have conquered the basics of Perl and are ready to move even further in their mastery of this versatile language will appreciate the author's well-balanced approach to applying Perl's analytical abilities to the field of bioinformatics. Full of practical examples and real-world biological problem solving, this book is a must for any reader wanting to move beyond beginner level Perl in bioinformatics.

Publisher resources

View/Submit Errata

Table of contents

  1. A Note Regarding Supplemental Files
  2. Foreword
  3. Preface
    1. About This Book
    2. What You Need to Know to Use This Book
    3. Organization of This Book
    4. Conventions Used in This Book
    5. Comments and Questions
    6. Acknowledgments
  4. I. Object-Oriented Programming in Perl
    1. 1. Modular Programming with Perl
      1. What Is a Module?
      2. Why Perl Modules?
        1. Subroutines and Software Engineering
        2. Modules and Libraries
      3. Namespaces
        1. Namespaces Compared with Scoping: my and use strict
      4. Packages
      5. Defining Modules
      6. Storing Modules
      7. Writing Your First Perl Module
        1. An Example: Geneticcode.pm
        2. Expanding Geneticcode.pm
      8. Using Modules
        1. Exporting Names
      9. CPAN Modules
        1. What’s Available at CPAN?
        2. Searching CPAN
        3. Installing Modules Using CPAN.pm
        4. Using the Newly Installed CPAN Module
        5. Problems with CPAN Modules
      10. Exercises
    2. 2. Data Structures and String Algorithms
      1. Basic Perl Data Types
      2. References
        1. References to Scalars
          1. Dereferencing
          2. Anonymous data
        2. References of References
        3. References to Arrays
          1. The arrow operator
          2. Anonymous arrays
        4. References to Hashes
          1. Anonymous hashes
        5. References to Subroutines
          1. Anonymous subroutines
          2. Passing references to subroutines
          3. Returning references from subroutines
        6. Symbolic Versus Hard References
      3. Matrices
        1. Two-Dimensional Matrices
        2. Higher-Dimensional Matrices
        3. Sparse Arrays
      4. Complex Data Structures
        1. Hash with Array Values
        2. Two-Dimensional Array of Hashes
        3. Complex Data Structures
      5. Printing Complex Data Structures
      6. Data Structures in Action
        1. The Problem of String Matching
        2. Genetic Variability and String Matching
      7. Dynamic Programming
      8. Approximate String Matching
        1. Edit Distance
          1. A string matching program
          2. Analysis
      9. Resources
      10. Exercises
    3. 3. Object-Oriented Programming in Perl
      1. What Is Object-Oriented Programming?
        1. Why Object-Oriented Programming?
        2. Terminology
      2. Using Perl Classes (Without Writing Them)
      3. Objects, Methods, and Classes in Perl
        1. Perl Objects Are Usually Hashes
      4. Arrow Notation (->)
      5. Gene1: An Example of a Perl Class
      6. Details of the Gene1 Class
        1. Variable Names and Conventions
        2. Carp and croak
        3. The new Constructor Method
        4. Creating an Object with bless
        5. Using ref to Report an Object’s Class
        6. Initialize an Object with an Anonymous Hash
        7. Accessor Methods
      7. Gene2.pm: A Second Example of a Perl Class
        1. Closures
        2. Tracking Class Data from the Constructor Method
        3. Accessor and Mutator Methods
      8. Gene3.pm: A Third Example of a Perl Class
        1. Testing Gene3.pm
      9. How AUTOLOAD Works
        1. Defining Global Variables
        2. AUTOLOAD Simplifies Writing Methods
          1. Bypassing use strict
          2. AUTOLOAD arguments
          3. Using naming conventions to write code: get_ and set_
          4. AUTOLOAD accessors
          5. AUTOLOAD mutators
          6. AUTOLOAD speedup
      10. Cleaning Up Unused Objects with DESTROY
      11. Gene.pm: A Fourth Example of a Perl Class
        1. Building Gene.pm
        2. Defining Attributes and Their Behaviors
        3. Initializing the Attributes of a New Object
          1. The newer new constructor
          2. The clone constructor
        4. Permissions
        5. Gene.pm Test Program and Output
      12. How to Document a Perl Class with POD
      13. Additional Topics
        1. Using Class::Struct to Define Classes
        2. Class Inheritance
        3. Bioperl
      14. Resources
      15. Exercises
    4. 4. Sequence Formats and Inheritance
      1. Inheritance
      2. FileIO.pm: A Class to Read and Write Files
        1. Analysis of FileIO
          1. The constructor method
          2. stat and localtime functions
          3. The write method
          4. AUTOLOAD
        2. Finishing FileIO
        3. Testing the FileIO Class Module
      3. SeqFileIO.pm: Sequence File Formats
        1. Analysis of SeqFileIO.pm
          1. The power of inheritance
          2. A new read method
        2. New Methods: is, parse, and put
          1. is_ methods
          2. put_ methods
          3. parse_ methods
        3. Testing SeqFileIO.pm
        4. Results
      4. Resources
      5. Exercises
    5. 5. A Class for Restriction Enzymes
      1. Envisioning an Object
      2. Rebase.pm: A Class Module
        1. Attributes: Short and Sweet
        2. Creating a Rebase Object
        3. Methods for the Rebase Class
        4. parse_rebase
        5. Methods to Translate Nucleotides to Regular Expressions
        6. Testing the Module
      3. Restriction.pm: Finding Recognition Sites
        1. The Restriction.pm Module
          1. Initializing Restriction objects
          2. The methods explained
          3. Documentation
      4. Drawing Restriction Maps
        1. Storing Graphics Output in an Attribute
        2. The Restrictionmap Class
          1. Adding graphics capability to the class
          2. Creation of the graphic
          3. Running the program
      5. Resources
      6. Exercises
  5. II. Perl and Bioinformatics
    1. 6. Perl and Relational Databases
      1. One Perl, Many Databases
      2. Popular Relational Databases
      3. Relational Database Definitions
      4. Structured Query Language
        1. SQL Commands
          1. Creating a database
          2. Creating tables
          3. Populating the tables
      5. Administering Your Database
        1. Adding Users
        2. Backup and Reloading
      6. Relational Database Design
      7. Perl DBI and DBD Interface Modules
        1. Installing and Configuring Perl DBI and DBD Modules
        2. Handling Tab-Delimited Input Files
        3. DBI Examples
          1. homologs.tabs
          2. homologs.load
          3. An SQL query
      8. A Rebase Database Implementation
        1. RebaseDB Class: Accessing Restriction Enzyme Data
        2. testRebaseDB: A Testing Program
        3. Analyzing RebaseDB
      9. Additional Topics
      10. Resources
      11. Exercises
    2. 7. Perl and the Web
      1. How the Web Works
        1. URLs
        2. HTML
          1. HTML web page example
          2. HTML directives
        3. HTTP
      2. Web Servers and Browsers
      3. The Common Gateway Interface
        1. Writing a CGI Program
        2. Installing a CGI Program
        3. Using the CGI.pm Module
        4. Testing a CGI Program
      4. Rebase: Building Dynamic Web Pages
        1. Installing webrebase1
        2. Inside webrebase1
      5. Exercises
    3. 8. Perl and Graphics
      1. Computer Graphics
        1. Basic Graphics Concepts
        2. Graphics and File Formats
      2. GD
        1. Installing GD
        2. Using GD
      3. Adding GD Graphics to Restrictionmap.pm
        1. Designing Graphics
          1. Applying color
          2. Calling the method
        2. Adding JPEG Output to Restrictionmap.pm
      4. Making Graphs
      5. Resources
      6. Exercises
    4. 9. Introduction to Bioperl
      1. The Growth of Bioperl
      2. Installing Bioperl
      3. Testing Bioperl
        1. Second Test
        2. Third Test
        3. Fourth Test
      4. Bioperl Problems
      5. Overview of Objects
      6. bptutorial.pl
      7. bptutorial.pl: sequence_manipulation Demo
      8. Using Bioperl Modules
  6. III. Appendixes
    1. A. Perl Summary
      1. Command Interpretation
      2. Comments
      3. Scalar Values and Scalar Variables
        1. Strings
        2. Numbers
        3. References
        4. Scalar Variables
      4. Assignment
      5. Statements and Blocks
      6. Arrays
      7. Hashes
      8. Complex Data Structures
      9. Operators
      10. Operator Precedence
      11. Basic Operators
        1. Arithmetic Operators
        2. Bitwise Operators
        3. String Operators
        4. File Test Operators
      12. Conditionals and Logical Operators
        1. true and false
        2. Logical Operators
        3. Using Logical Operators for Control Flow
        4. The if Statement
      13. Binding Operators
      14. Loops
      15. Input/Output
        1. Input from Files
        2. Input from STDIN
        3. Input from Files Named on the Command Line
        4. Output Commands
          1. Output to STDOUT, STDERR, and files
      16. Regular Expressions
        1. Overview
        2. Metacharacters
          1. Escaping with \
          2. Alternation with |
          3. Grouping with ( )
          4. Character classes
          5. Matching any character with a dot
          6. Beginning and end of strings with ^ and $
          7. Quantifiers
          8. Making quantifiers match minimally with ?
        3. Capturing Matched Patterns
        4. Metasymbols
        5. Extending Regular-Expression Sequences
        6. Pattern Modifiers
      17. Scalar and List Context
      18. Subroutines
      19. Modules and Packages
      20. Object-Oriented Programming
      21. Built-in Functions
    2. B. Installing Perl
      1. Installing Perl on Your Computer
        1. Perl May Already Be Installed
      2. Versions of Perl
      3. Internet Access
      4. Downloading
        1. Binary Versus Source Code
        2. Perl for Unix and Linux
        3. Perl for Macintosh
        4. Perl for Windows
      5. How to Run Perl Programs
        1. Running Perl Programs on Unix or Linux
        2. Running Perl Programs on the Macintosh
        3. Running Perl Programs on Windows
      6. Finding Help
  7. Index
  8. About the Author
  9. Colophon
  10. Copyright

Product information

  • Title: Mastering Perl for Bioinformatics
  • Author(s): James Tisdall
  • Release date: September 2003
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9780596003074