Spidering Hacks
100 Industrial-Strength Tips & Tools
Publisher: O'Reilly Media
Released: October 2003
Pages: 426
Description
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Recently Viewed
Data Crunching
By Greg Wilson
April 2005
Print: $29.95
Art of Drum Layering
By Eddie Bazil
August 2009
Print: $19.95
DNS and BIND
By Paul Albitz, Cricket Liu
October 1992
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreilly Spidering Hacks
 
4.2

(based on 6 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (3)

  • 3 Stars

     

    (1)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviews

Reviewed by 6 customers

Sort by

Displaying reviews 1-6

Back to top

(1 of 1 customers found this review helpful)

 
3.0

Not as helpful as I would like

By garthm9

from Atlanta, GA

About Me Sys Admin

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly Spidering Hacks:

    Overview:
    This book is a mashup of scripts that can be used to gather information from a number of resources on the web and put them in a format of your choosing. The book relies heavily on the Perl scripting language. As such, many of these scripts are not very complicated because they use Perl modules to do the heavy lifting in their various tasks, so the scripts become front-ends to more complicated processes. As member of a club, I was given a copy of the book to read and review. My goal in reading the book was to look for ways spidering could be used in the corporate world. Specifically, ways I could aggregate data from various reports and management services to provide data in a format that was more useful to me.

    Pros:
    The breadth of scripts is impressive. A reader would be hard pressed to come up with a scenario that involves getting data off of a website that is not covered in this book at some level. The examples are fairly generic, but the author not only explains how you might use it in your situation, but in many cases, the author gives advanced tips and examples that go beyond the basic ideas that he presented. Since most of the scripts are based on one or more Perl modules, the scripts are fairly simple. A (trained) beginner's level of understanding is all that is necessary to copy a few of these scripts and modify a few key lines to make it work in other situations.

    Cons:
    The book is starting to get a little dated. That being said, the basic technologies that I could identify remain applicable. Still, while aggregating web details is a nice idea, most of the aggregating suggestions they used have already been done by someone somewhere on the web. If you have a particular need, I would suggest a serious Google search for a turnkey solution before I embarked on one of these projects. The main issue I have with this book is a flip-side to one of its strengths. By using Perl, most of the complicated work is done in the background by modules that are hidden from the view of the user. If you are planning to use any other language, you are suddenly faced with not only translating the basic script functionality presented in the book, but now you must dissect and replicate the Perl modules as well. Experienced programmers can figure out how to replicate these modules into other scripting languages, but that is a fairly advanced task. This turns a fairly simple to moderate project into a daunting one. As I mentioned above, my focus was using these techniques in a corporate environment. Since I deal exclusively with MS OSes, I use Powershell as my script language of choice. I was able to use Powershell to replicate one of the more basic ideas, but due to the differences in code, I basically had to start from scratch. Beyond this one project, the task of replicating many of the Perl services has been too complicated for me to do in my limited time. I was looking for quick and simple ways of repackaging data, and that is not what I found give the code-language translation issues.

     
    5.0

    Good book

    By garyamort

    from Undisclosed

    Comments about oreilly Spidering Hacks:

    Gives a lot of great ideas for spidering. Emphasis is on perl, with some occasionaly diversions into other languages for specific functions.

    Personally, I'd prefer either a broader mix of languages, or restriction to one language. Still, overall a great book to give you a lot of ideas.

     
    4.0

    Spidering Hacks Review

    By Doug Smith

    from Undisclosed

    Comments about oreilly Spidering Hacks:

    I enjoy the hacks series a load! The toys you can use immediately are great fun. I immediately borrowed the idea from the "automatically find blogs of your interest" chapter, and modified it to find "friends of friends" for a blog-happy girlfriend.

    What I liked most about the book, is that it really broadened my perl horizons. Especially the section "building a toolkit". A great start to using some perl modules that help you get the job done -fast-.

    Being someone who has built a variety of spiders/scrapers, I appreciated the insight from the authors, and appreciate finding the info in a consise condensed reference... something unknown to builders (and would-be builders) of crawlers in the past.

     
    4.0

    Spidering Hacks Review

    By Bill Day

    from Undisclosed

    Comments about oreilly Spidering Hacks:

    Spidering Hacks

    Authors: Kevein Hemenway & Tara Calishain

    Publisher: O'Reilly & Associates

    Price: $24.95

    Pages: 402

    Web site:

    Reviewed by Bill Day,

    Grand Rapids (Michigan) PerlMongers

    4.5 stars (5 star scale). This book is not perfect, the authors may have tried to cover too much material. The material is very time sensitive, hence the book needed to be rushed together, it will have little value in 5 years. I wanted to give the book a higher rating, I tried to think of a better way to present the material in 400 pages and couldn't. There are just too many rough edges for a 5 star book.

    As a member of O'Reilly's "Hacks" series, "Spidering Hacks" is different than the typical O'Reilly book. This book presents breadth of topic rather than depth. The format is 100 hacks (mostly Perl on Linux with an odd Python, Java, or Windows hack), some written by Hemenway & Calishain, many written by guest authors organized into 6 chapters. The number of authors leads to a variety of styles in both English and Perl. If you treat the book as a super magazine (time sensitive short articles), you won't be disappointed.

    Chapter 1 – Walking Softly (Hacks 1-7)

    Chapter 1 provides general guidelines on spider/scraper etiquette and good practices, which the rest of the book seems to ignore.

    Chapter 2 – Assembling a toolkit (Hacks 8-32)

    An overview of several modules and techniques with working examples. More experienced Perl mongers may find this material remedial.

    Chapter 3 – Collecting media files (Hacks 33-42)

    The hacks on POP3 attachments and Usenet may be worth the price of the book for those trying to solve a particular problem.

    Chapter 4 – Gleaning data from databases (Hacks 43-89)

    Over 1/2 the book is dedicated to this chapter. Initially it appears that these are very specific solutions for a narrow audience. Closer reading reveals a variety of techniques that can be used in many circumstances.

    Chapter 5 – Maintaining your collections (Hacks 90-93)

    Not much here. Cron is covered much better in other works.

    Chapter 6 – Giving back to the world (Hacks 94-100)

    Essentially how to be nice to spiders. Why Net::AIM is covered here seems arbitrary. Hack #100 "Going beyond the book" is nothing but fluff.

    An example of how I used the book may be illustrative. I wanted to scrape TV listings, but hack #73 "Scraping TV Listings" has been made obsolete by a modification to tvguide.com. I was able to quickly use the toolkit presented in chapter 2 to scrape one of the many other web sites with TV listings. I expect this to be typical, sites change, spiders and scrapers need to adapt.

    Spider Hacks is an odd collection of articles that seem to cover the remedial to intermediate skill ranges. Nobody will benefit from all 100 hacks, but most of us will find $24.95 of value in the hacks that cause us to go "How cool!".

     
    4.0

    Spidering Hacks Review

    By Mike Sipin

    from Undisclosed

    Comments about oreilly Spidering Hacks:

    I have been trying to find a Java book that offered me tips and tricks on how to scrape the Internet, glean the most tasty bits of it, and put them to good use. I ran across "Spidering Hacks", by Kevin Hemenway and Tara Calishain, which was exactly what I wanted - only it's base language is Perl.

    To my delight, the authors' writing is so lucid, their support and encouragement so welcome, and their examples so closely matched to my needs - that I immediately picked up this book, and dove headlong into the vast and beautiful world that is Perl.

    Despite my preference for programming in Java for Internet-related tasks, I highly recommend this book, even for those unfamiliar with the Perl programming language, as this book is written so well that you can get up and running purely on the strength of the authors' talents. I am very impressed with this book.

    Kudos to the authors.

     
    5.0

    Spidering Hacks Review

    By Marcus P. Zillman, M.S., A.M.H.A.

    from Undisclosed

    Comments about oreilly Spidering Hacks:

    Excellent job in explaining the realworld solutions to data spidering, scraping and manipulation of the data. I have educated the Internet community about the positive benefits of bots for years and this book does an extraordinary job of giving industrial strength tips, tools and hacks highighted in a easy to understand format with concrete step by step instructions on the code, running the hack and hacking the hack. Great job Kevin and Tara!

    Displaying reviews 1-6

    Back to top

     
    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Save a Tree - Go Digital  what is this?
    Ebook: $23.99
    Formats: DAISY, PDF
    Print & Ebook: $32.99
    Print: $29.99