Sequence Analysis in a Nutshell: A Guide to Tools
A Guide to Common Tools and Databases
Publisher: O'Reilly Media
Final Release Date: January 2003
Pages: 304

Gene sequence data is the most abundant type of data available, and if you're interested in analyzing it, you'll find a wealth of computational methods and tools to help you. In fact, finding the data is not the challenge at all; rather it is dealing with the plethora of flat file formats used to process the sequence entries and trying to remember what their specific field codes mean. If you survive by surrounding yourself with well-thumbed hard copies of readme files or remembering exactly where to look for the details when you need them, then Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases is for you. This book is a handy resource, as well as an invaluable reference, for anyone who needs to know about the practical aspects and mechanics of sequence analysis.Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book is partitioned into three fundamental areas to help you maximize your use of the content. The first section, "Databases" contains examples of flatfiles from key databases (GenBank, EMBL, SWISS-PROT), the definitions of the codes or fields used in each database, and the sequence feature types/terms and qualifiers for the nucleotide and protein databases.The second section, "Tools" provides the command line syntax for popular applications such as ReadSeq, MEME/MAST, BLAST, ClustalW, and the EMBOSS suite of analytical tools. The third section, "Appendixes" concentrates on information essential to understanding the individual components that make up a biological sequence. The tables in this section include nucleotide and protein codes, genetic codes, as well as other relevant information.Written in O'Reilly's enormously popular, straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students. If sequence analysis is part of your daily life, you'll want this easy-to-use book on your desk.

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews


by PowerReviews
oreillySequence Analysis in a Nutshell: A Guide to Tools

(based on 1 review)

Ratings Distribution

  • 5 Stars



  • 4 Stars



  • 3 Stars



  • 2 Stars



  • 1 Stars



Reviewed by 1 customer

Displaying review 1

Back to top

(1 of 1 customers found this review helpful)


Sequence Analysis in a Nutshell Review

By Patrick Fleury

from Undisclosed

Comments about oreilly Sequence Analysis in a Nutshell: A Guide to Tools:

Title: Sequence Analysis in a Nutshell (SAIAN)

Authors: Markel, S and Leon, D.

Publisher: O'Reilly and Associates

Year: 2003

The basic idea behind sequence analysis is the classification of DNA or protein sequences in terms of other known DNA or protein sequences. To take a simple case, suppose there is a laboratory team that decodes a section of human - or mouse or rat - DNA and finds it corresponds to a sequence of letters, perhaps something like AGTTCGATTGATTGCA. (This is a fairly small sequence.) The team might want to find out is what is already known about this particular sequence. To do this, they would compare their sequence to a known database of sequences.

This database searching is not a trivial matter because, not only would they want to find out if there are any exact matches for their sequence, they might also want to find out if there are any approximate matches. Here, approximate takes on a new meaning because it not only means sequences that share a large number of exact matches, but also sequences where parts of their sequence appear separated by other letters. For example, if you consider the above sequence, it appears in the sequence


except that there are a few other letters interspersed within it. Or, they might be happy to find a sequence like the above except some of the letters have been transposed to other letters. For example, the sequence ATTTCGGTAGATGCA is the above sequence with a couple of random letter changes.

Such alignments, although highly unintuitive to the uninitiated, might be useful to the biological researcher.

The team might also want to search not only databases of human DNA but also mouse DNA, rat DNA or perhaps even the worm, C Elegans.

I could go on with this, but I am merely trying to convince you that searching for one sequence among other sequences is not just a matter of bringing up a regular expression engine and letting it do its job. Instead, it's a very sophisticated process with lots of variations and parameters. Indeed a lot of work has gone into tweaking the particular types of algorithm to use in such searches. These algorithms have been codified into families with titles such as BLAST (Basic Local Alignment Search Tool) and BLAT (BLAST-Like Alignment Tool) and ClustalW and they are available in various places on the web.

This brings us to the volume under discussion. While it is possible to find out about these tools by searching the net, it would be useful to have one source that contained information about all of them in one easy to use format. This volume is that source.

This is another of O'Reilly's Nutshell series. Like the others in the series such as "Perl in a Nutshell", "C++ in a Nutshell" etc., the volume does not have as its main point the explication of the theory of sequence analysis. You will need to look elsewhere for that. Instead, it collects in one place a lot of information about the tools that are useful.

The first five chapters are devoted to clear descriptions of the common data formats you will run into in sequence analysis. These include, FASTA, SWISS-PROT, GenBank and some of their relatives.

The next few chapters are devoted to the tools that make these analyses work. Surprisingly, BLAST, one of the most popular of the search algorithms gets pretty short shrift. It only has about seven pages devoted to it. This might be due to the fact that O'Reilly recently published a book devoted entirely to BLAST. (There will be more about that later.)

The short space given to BLAST might also be because the authors wanted to save a lot of space for EMBOSS (European Molecular Biology Open Software Suite). EMBOSS is a suite of over 100 programs for sequence analysis that have been released as open source and whose code is available on the web. Anyone who wants to see real working C-code to perform sequence analysis matching would do well to down load these programs and study them. Markel and Leon devote almost 170 pages to this suite and all of its possible options and flags. By the way, the section on EMBOSS is really the only place a where a particular programming language appears in the book and it doesn't really appear because you need to download the code to see it. There is no Perl in "SAIAN".

Besides data formats and descriptions of tools, the book also has some other useful parts. For example, it has appendices devoted to amino acid and nucleotide tables, and genetic codes. It also lists a lot of websites where interested parties can go to find more information.

This book looks useful for anyone who would like to have good single reference for sequence analysis tools.

All of the above notwithstanding, the book is a manual and sometimes reading it is just like reading a Unix Man page. It may be informative, but, if you really want to know what is going on, you may need to look elsewhere for some further explanation. In particular, the treatment of BLAST in "SAIAN" does not really tell you what is going on. I would be much harder on "SAIAN" were it not for the fact the O'Reilly recently published another book titled simply "BLAST".

"BLAST", which was written by Ian Korff, Mark Yandell and Joseph Bedell, is subtitled "An Essential Guide to the Basic Local Alignment Search Tool" and it is indeed that. It contains not only a detailed introduction to BLAST, but also a short introduction to the theory behind BLAST. If you want to find out a little bit about basic genetics and how BLAST works into sequence alignment, you could do a lot worse then read this book. It goes through the algorithms in some detail and actually shows you some elementary Perl code to carry out some of the algorithms. Furthermore, it contains an introduction to some of the statistical methods behind the code. (If you want to go deeply into the theory behind the algorithms, I recommend the book by Durbin, Krogh, etc referenced at the end of this review.)

In summary, "Sequence Analysis in a Nutshell" is a useful tool.

It collects in one place common data formats.

It also collects references to common algorithms such as BLAST and BLAT.

It has a large section on EMBOSS.

It has appendices on genetic codes and nucleotides.

It has a lot of references to URLS for finding more information and for downloading code.

It does not have enough about BLAST but, the book called "BLAST", also from O'Reilly, provides a very good reference for that tool along with other more theoretical information.

Finally, I want to point out the animal on the cover of SAIAN works as symbolism on several levels. It is a liger a cross between a male lion and a female tiger. (A cross between a male tiger and a female lion is called a tigon. Ah, the wonderful things you learn from reading the colphon of an O'Reilly book.) It is not only fitting that such a mixture of genes be on the cover of this book but it is nice to note that the authors work for LION bioscience.

Patrick Fleury

Books referenced in the above

Durbin, R. Eddy, S., Krogh, A. and Mitchison, G. 1998, Biological Sequence Analysis, New York: Cambridge University Press

Korf, I, Yandell, M. and Bedell, J., 2003, BLAST, Sebastopol: O'Reilly

Markel, S. and Leon, D., 2003, Sequence Analysis in a Nutshell, Sebastopol: O'Reilly

Displaying review 1

Back to top

Buy 2 Get 1 Free Free Shipping Guarantee
Buying Options
Immediate Access - Go Digital what's this?
Print:  $29.95