Webbots, Spiders, and Screen Scrapers, 2nd Edition
A Guide to Developing Internet Agents with PHP/CURL
Publisher: No Starch Press
Final Release Date: March 2012
Pages: 392

There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you?

Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions. Michael Schrenk, a highly regarded webbot developer, teaches you how to develop fault-tolerant designs, how best to launch and schedule the work of your bots, and how to create Internet agents that:

  • Send email or SMS notifications to alert you to new information quickly
  • Search different data sources and combine the results on one page, making the data easier to interpret and analyze
  • Automate purchases, auction bids, and other online activities to save time

Sample projects for automating tasks like price monitoring and news aggregation will show you how to put the concepts you learn into practice.

This second edition of Webbots, Spiders, and Screen Scrapers includes tricks for dealing with sites that are resistant to crawling and scraping, writing stealthy webbots that mimic human search behavior, and using regular expressions to harvest specific data. As you discover the possibilities of web scraping, you'll see how webbots can save you precious time and give you much greater control over the data available on the Web.

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyWebbots, Spiders, and Screen Scrapers, 2nd Edition
 
4.8

(based on 5 reviews)

Ratings Distribution

  • 5 Stars

     

    (4)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Well-written (4)
  • Accurate (3)
  • Easy to understand (3)
  • Helpful examples (3)

Cons

    Best Uses

    • Intermediate (4)
      • Reviewer Profile:
      • Developer (3)

    Reviewed by 5 customers

    Sort by

    Displaying reviews 1-5

    Back to top

     
    4.0

    Screen Scraping for Fun and Profit

    By TrevK

    from Bath, UK

    About Me Developer

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice

      Comments about oreilly Webbots, Spiders, and Screen Scrapers, 2nd Edition:

      I've used this mostly as a guide for useful functions for webbots, as I prefer Python and urllib2 to PHP and CURL (the implementation language and internet library in the book). The book defines useful approaches to automating your interaction with the Internet. It's written in a relaxed but non-patronising style and the author clearly knows his stuff (e.g. his justification for regular expressions not always being the most useful means of parsing text was a revelation to me).

      My only gripe is the author's choice of PHP as the implementation language, but if he'd chosen Python I guess some other reviewer would complain about that :-) Nevertheless, I was really pleased with my purchase.

       
      5.0

      A great book I will read again

      By Phil Ballew

      from California

      About Me Maker, Sys Admin

      Verified Reviewer

      Pros

      • Accurate
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate

        Comments about oreilly Webbots, Spiders, and Screen Scrapers, 2nd Edition:

        This book has allowed me to look at the way information on the Internet exists in a whole new way. I can see how to get the information I need not just by the simple tools provided by a web browser, but by the magic that both PHP and CURL can provide to someone willing to script all their desires into a text file. Finally, a book that allows people to learn the practical skills to make the web work for them. I highly recommend this book to anyone who needs to learn how to get information off the web in an unconventional manner.

         
        5.0

        A good introduction to WebBots

        By Daniel Lewis

        from Beaumont Ca

        Verified Reviewer

        Comments about oreilly Webbots, Spiders, and Screen Scrapers, 2nd Edition:

        As a long time developer I have often found my self having to write up screen scrapers and crawlers to go out and automate various processes across the net. In Webbots, Spiders, and Screen Scrapers, Michael Schrenk has done a good job of going through and laying out many of the scenarios in which you would want to spin up a spider and how to use them effectively and more importantly correctly.

        Of the many different types of web-bots that are discussed in the book a couple that stood out to me as useful were link verification bots, SEO helper bots, FTP Web Bots, and Procurement or (Sniper) bots.

        His explanations and examples are detailed and easy to follow.

        If your interested in learning more about how to make the internet work for you this is defiantly a book worth picking up and reading.

        The book is written for PHP/CURL, but the concepts can be easily adapted to any programming language you may happen to be writing in.

        Now days you could even do most of the functionality described using node.js and phantom.js (of course you would still need to store your data somewhere)

        Overall it was a good book, and I enjoyed going through it.

        http://sympletech.com/book-review-webbots-spiders-and-screen-scrapers/

         
        5.0

        Awesome book :)

        By Fale

        from Milan, Italy

        About Me Developer

        Verified Reviewer

        Pros

        • Accurate
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate
          • Student

          Comments about oreilly Webbots, Spiders, and Screen Scrapers, 2nd Edition:

          I've been immediately caught from this book since I'm in a project that is based on a webbot, and I'm developing it using PHP and cURL.

          Since the first day I've started to use cURL (and a lot of other PHP classes) I wondered about the sense of all these complex and extravagant classes and functions. The book author has a really different approach to the classes, and his approach is way more similar to mine that the standard PHP approach. One example of this is the LIB_http.php class that is a wrapper for some http functions.

          When I started to code my own webbot, I thought I was taking the wrong approach since I preferred to parse the HTML with RegEx instead of following the tree-navigation. Reading this book I discovered that the approach I used is the one suggested by the author.

          The book arrives to describe how to operate a botnet and this scared me in the first moment. Even after I read it all, I still doubt about the rightness of putting some of these knowledge public like this in a book.

          The only part of the book that I did not like was the one about iMacros, since I like to see the whole code of everything I use.

          I really liked the book, therefore I'll give it a 5/5 and I would suggest this book to everyone is interested in the webbots and spiders world.

           
          5.0

          Automating data collection with your eye

          By grandslam

          from Honolulu

          About Me Developer, Sys Admin

          Verified Reviewer

          Pros

          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate
            • Novice
            • Student

            Comments about oreilly Webbots, Spiders, and Screen Scrapers, 2nd Edition:

            This is a review of Michael's 2nd Edition (I received an early release copy from the publisher, I did not have an opportunity to read the 1st edition):

            I thoroughly enjoy this book. I found myself glued to this topic, I have heard about it many times before just never investigated it. This is "good stuff" and I missed out by not starting earlier. The author, Michael Schrenk knows his stuff and is passionate about his craft and it shows in the way he writes. All throughout his book his excitement about how incredible this technology is, and his use of these tools in creative ways is contagious. I like to read books by authors who are so enthusiastic about their subject matter, as oppose to just droning out facts and knowledge. Reading this book was exciting and addicting. Following along, tinkering with his examples was just play fun. His excitement and ingenious way of looking at things just rubs off, even before I got to the real-world examples the ideas just started flowing. It's like I just discovered the next BIG THING, but I'm not going to shared that here.

            He does a great job of explaining everything in step by step details and then compliments them with photos and diagrams to aide with comprehension. His code examples are simple and it was easy to see what was going on. His code examples are written in an imperative, or procedural style as oppose to an object oriented style, which in my opinion, is better suited when teaching new or difficult concepts. Also, it's just easier to follow along by a wider range of people with varying programming backgrounds. He also provides his own supplemental library (via the book website), to simplify using cURL itself. Using his library, I was able to quickly get things up and running and see how everything works, and that is a good thing when learning something new. It sets you on a possible spin and leaves you with nothing but good stuff to say about the subject you just learned. In the end, would I recommend this book to others? Absolutely. It is just like learning the command line, once you start and see the benefits, you never look back.

            Displaying reviews 1-5

            Back to top

             
            Buy 2 Get 1 Free Free Shipping Guarantee
            Buying Options
            Immediate Access - Go Digital what's this?
            Ebook: $31.95
            Formats:  ePub, Mobi, PDF
            Print & Ebook: $43.95
            Print: $39.95