The Internet is bigger and better than what a mere browser allows. Webbots, Spiders, and Screen Scrapers is for programmers and businesspeople who want to take full advantage of the vast resources available on the Web. There's no reason to let browsers limit your online experience-especially when you can easily automate online tasks to suit your individual needs.
Learn how to write webbots and spiders that do all this and more:
Programmatically download entire websites
Effectively parse data from web pages
Manage cookies
Decode encrypted files
Automate form submissions
Send and receive email
Send SMS alerts to your cell phone
Unlock password-protected websites
Automatically bid in online auctions
Exchange data with FTP and NNTP servers
Sample projects using standard code libraries reinforce these new skills. You'll learn how to create your own webbots and spiders that track online prices, aggregate different data sources into a single web page, and archive the online data you just can't live without. You'll learn inside information from an experienced webbot developer on how and when to write stealthy webbots that mimic human behavior, tips for developing fault-tolerant designs, and various methods for launching and scheduling webbots. You'll also get advice on how to write webbots and spiders that respect website owner property rights, plus techniques for shielding websites from unwanted robots.
As a bonus, visit the author's website to test your webbots on sample target pages, and to download the scripts and code libraries used in the book.
Some tasks are just too tedious-or too important!- to leave to humans. Once you've automated your online life, you'll never let a browser limit the way you use the Internet again.
FUNDAMENTAL CONCEPTS AND TECHNIQUES
Chapter 1 WHAT'S IN IT FOR YOU?
Uncovering the Internet's True Potential
What's in It for Developers?
What's in It for Business Leaders?
Final Thoughts
Chapter 2 IDEAS FOR WEBBOT PROJECTS
Inspiration from Browser Limitations
A Few Crazy Ideas to Get You Started
Final Thoughts
Chapter 3 DOWNLOADING WEB PAGES
Think About Files, Not Web Pages
Downloading Files with PHP's Built-in Functions
Introducing PHP/CURL
Installing PHP/CURL
LIB_http
Final Thoughts
Chapter 4 PARSING TECHNIQUES
Parsing Poorly Written HTML
Standard Parse Routines
Using LIB_parse
Useful PHP Functions
Final Thoughts
Chapter 5 AUTOMATING FORM SUBMISSION
Reverse Engineering Form Interfaces
Form Handlers, Data Fields, Methods, and Event Triggers
Unpredictable Forms
Analyzing a Form
Final Thoughts
Chapter 6 MANAGING LARGE AMOUNTS OF DATA
Organizing Data
Making Data Smaller
Thumbnailing Images
Final Thoughts
PROJECTS
Chapter 7 PRICE-MONITORING WEBBOTS
The Target
Designing the Parsing Script
Initialization and Downloading the Target
Further Exploration
Chapter 8 IMAGE-CAPTURING WEBBOTS
Example Image-Capturing Webbot
Creating the Image-Capturing Webbot
Further Exploration
Final Thoughts
Chapter 9 LINK-VERIFICATION WEBBOTS
Creating the Link-Verification Webbot
Running the Webbot
Further Exploration
Chapter 10 ANONYMOUS BROWSING WEBBOTS
Anonymity with Proxies
The Anonymizer Project
Final Thoughts
Chapter 11 SEARCH-RANKING WEBBOTS
Description of a Search Result Page
What the Search-Ranking Webbot Does
Running the Search-Ranking Webbot
How the Search-Ranking Webbot Works
The Search-Ranking Webbot Script
Final Thoughts
Further Exploration
Chapter 12 AGGREGATION WEBBOTS
Choosing Data Sources for Webbots
Example Aggregation Webbot
Adding Filtering to Your Aggregation Webbot
Further Exploration
Chapter 13 FTP WEBBOTS
Example FTP Webbot
PHP and FTP
Further Exploration
Chapter 14 NNTP NEWS WEBBOTS
NNTP Use and History
Webbots and Newsgroups
Further Exploration
Chapter 15 WEBBOTS THAT READ EMAIL
The POP3 Protocol
Executing POP3 Commands with a Webbot
Further Exploration
Chapter 16 WEBBOTS THAT SEND EMAIL
Email, Webbots, and Spam
Sending Mail with SMTP and PHP
Writing a Webbot That Sends Email Notifications
Further Exploration
Chapter 17 CONVERTING A WEBSITE INTO A FUNCTION
Writing a Function Interface
Final Thoughts
ADVANCED TECHNICAL CONSIDERATIONS
Chapter 18 SPIDERS
How Spiders Work
Example Spider
LIB_simple_spider
Experimenting with the Spider
Adding the Payload
Further Exploration
Chapter 19 PROCUREMENT WEBBOTS AND SNIPERS
Procurement Webbot Theory
Sniper Theory
Testing Your Own Webbots and Snipers
Further Exploration
Final Thoughts
Chapter 20 WEBBOTS AND CRYPTOGRAPHY
Designing Webbots That Use Encryption
A Quick Overview of Web Encryption
Local Certificates
Final Thoughts
Chapter 21 AUTHENTICATION
What Is Authentication?
Example Scripts and Practice Pages
Basic Authentication
Session Authentication
Final Thoughts
Chapter 22 ADVANCED COOKIE MANAGEMENT
How Cookies Work
PHP/CURL and Cookies
How Cookies Challenge Webbot Design
Further Exploration
Chapter 23 SCHEDULING WEBBOTS AND SPIDERS
The Windows Task Scheduler
Complex Schedules
Non-Calendar-Based Triggers
Final Thoughts
LARGER CONSIDERATIONS
Chapter 24 DESIGNING STEALTHY WEBBOTS AND SPIDERS
Why Design a Stealthy Webbot?
Stealth Means Simulating Human Patterns
Final Thoughts
Chapter 25 WRITING FAULT-TOLERANT WEBBOTS
Types of Webbot Fault Tolerance
Error Handlers
Chapter 26 DESIGNING WEBBOT-FRIENDLY WEBSITES
Optimizing Web Pages for Search Engine Spiders
Web Design Techniques That Hinder Search Engine Spiders
Michael Schrenk uses webbots and data-driven web applications to create competitive advantages for businesses. He has written for Computerworld and Web Techniques magazines and has taught courses on Web usability and Internet marketing. He has also given presentations on intelligent Web agents and online corporate intelligence at the DEFCON hacker's convention.