Book description
There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you? Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions.
Table of contents
- Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
- About the Author
- About the Technical Reviewer
- Acknowledgments
- Introduction
-
I. Fundamental Concepts and Techniques
- 1. What’s in It for You?
- 2. Ideas for Webbot Projects
- 3. Downloading Web Pages
- 4. Basic Parsing Techniques
-
5. Advanced Parsing with Regular Expressions
- Pattern Matching, the Key to Regular Expressions
- PHP Regular Expression Types
- Learning Patterns Through Examples
- Regular Expressions of Particular Interest to Webbot Developers
- When Regular Expressions Are (or Aren’t) the Right Parsing Tool
- Final Thoughts
- 6. Automating Form Submission
- 7. Managing Large Amounts of Data
-
II. Projects
- 8. Price-Monitoring Webbots
- 9. Image-Capturing Webbots
- 10. Link-Verification Webbots
- 11. Search-Ranking Webbots
- 12. Aggregation Webbots
- 13. FTP Webbots
- 14. Webbots That Read Email
- 15. Webbots That Send Email
- 16. Converting a Website into a Function
-
III. Advanced Technical Considerations
- 17. Spiders
- 18. Procurement Webbots and Snipers
- 19. Webbots and Cryptography
- 20. Authentication
- 21. Advanced Cookie Management
- 22. Scheduling Webbots and Spiders
- 23. Scraping Difficult Websites with Browser Macros
- 24. Hacking iMacros
- 25. Deployment and Scaling
-
IV. Larger Considerations
- 26. Designing Stealthy Webbots and Spiders
- 27. Proxies
- 28. Writing Fault-Tolerant Webbots
- 29. Designing Webbot-Friendly Websites
- 30. Killing Spiders
- 31. Keeping Webbots out of Trouble
-
A. PHP/CURL Reference
- Creating a Minimal PHP/CURL Session
- Initiating PHP/CURL Sessions
-
Setting PHP/CURL Options
- CURLOPT_URL
- CURLOPT_RETURNTRANSFER
- CURLOPT_REFERER
- CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
- CURLOPT_USERAGENT
- CURLOPT_NOBODY and CURLOPT_HEADER
- CURLOPT_TIMEOUT
- CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR
- CURLOPT_HTTPHEADER
- CURLOPT_SSL_VERIFYPEER
- CURLOPT_USERPWD and CURLOPT_UNRESTRICTED_AUTH
- CURLOPT_POST and CURLOPT_POSTFIELDS
- CURLOPT_VERBOSE
- CURLOPT_PORT
- Executing the PHP/CURL Command
- Closing PHP/CURL Sessions
- B. Status Codes
- C. SMS Gateways
- Index
- About the Author
- Colophon
Product information
- Title: Webbots, Spiders, and Screen Scrapers, 2nd Edition
- Author(s):
- Release date: March 2012
- Publisher(s): No Starch Press
- ISBN: 9781593273972
You might also like
book
Black Hat Python, 2nd Edition
When it comes to creating powerful and effective hacking tools, Python is the language of choice …
book
bash Idioms
Shell scripts are everywhere, especially those written in bash-compatible syntax. But these scripts can be complex …
book
Learning Modern Linux
If you use Linux in development or operations and need a structured approach to help you …
video
Linux Fundamentals, 2nd Edition
10+ Hours of Video Instruction More than 10 hours of video instruction to get you up …