Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs. The recipes include techniques to:
Use OAuth to access Twitter data
Create and analyze graphs of retweet relationships
Use the streaming API to harvest tweets in realtime
Harvest and analyze friends and followers
Discover friendship cliques
Summarize webpages from short URLs
This book is a perfect companion to O’Reilly's Mining the Social Web.
Chapter 1 The Recipes
Using OAuth to Access Twitter APIs
Looking Up the Trending Topics
Extracting Tweet Entities
Searching for Tweets
Extracting a Retweet’s Origins
Creating a Graph of Retweet Relationships
Visualizing a Graph of Retweet Relationships
Capturing Tweets in Real-time with the Streaming API
Making Robust Twitter Requests
Harvesting Tweets
Creating a Tag Cloud from Tweet Entities
Summarizing Link Targets
Harvesting Friends and Followers
Performing Setwise Operations on Friendship Data
Resolving User Profile Information
Crawling Followers to Approximate Potential Influence
Analyzing Friendship Relationships such as Friends of Friends
Analyzing Friendship Cliques
Analyzing the Authors of Tweets that Appear in Search Results
Matthew Russell, Vice President of Engineering at Digital Reasoning Systems (http://www.digitalreasoning.com/) and Principal at Zaffra (http://zaffra.com), is a computer scientist who is passionate about data mining, open source, and web application technologies. He’s also the author of Dojo: The Definitive Guide (O’Reilly).
Comments about O'Reilly Media 21 Recipes for Mining Twitter:
The book 21 Recipes for Mining Twitter is an add-on to another book I am reviewing by Matthew Russell, Mining the Social Web.
This small, yet incredibly useful, book covers 21 tips and accompanying code for mining Twitter data. There is no fluff in this 60 page book with page 1 diving right into OAuth access.
Each of the tips (recipes) start with the problem , a brief solution and then the lengthy solution and code samples to bring the two together. Everything in the book is written in Python with much of it being made accessible via easy_install.
While the majority of this book is code, it is an incredible companion to get you moving in pulling data, trends or just about anything from Twitter. Creating and analyzing graphs becomes easier, discovering friendships and cliques, pulling geo-data and even finding a retweet's source.
Much of the metadata we produce via Twitter gets lost instantly, since no one digs and mines the underlying data. This book can help you build some product or service you want around Twitter and hands you basic code to get you started. The book 21 Recipes for Mining Twitter is a great resource.
3/7/2011
(1 of 1 customers found this review helpful)
4.0
Full of tips to start mining twitter
By jsanpedro
from State College, PA
About Me Researcher
Pros
Concise
Helpful examples
Cons
Assumes API knowledge
Best Uses
Intermediate
Novice
Comments about O'Reilly Media 21 Recipes for Mining Twitter:
This book provides readers with a quite comprehensive introduction to extracting and analyzing information from Twitter. While it is expected that the reader is somewhat familiar with the different Twitter APIs, the author does a fantastic job at presenting strategies for crawling and mining data using python and some additional and freely available third party libraries.
The main three aspects that I loved about this little gem were:
- The author does a great job at highlighting the main Twitter's API limitations (e.g. maximum number of requests for each API call) and bugs (e.g. user ids being different in the '/search' API). Solutions, in the form of functional code, are given. This information can save literally hours debugging code or waiting for twitter to remove restrictions imposed after going beyond some of the limits imposed by the system.
- All the code, available for free from the author's github.com account, is very well conceived, illustrative and most of the time can be used directly from the command line to perform simple tasks with Twitter's data.
- Lots of 3rd party libraries and tools (e.g. CouchDB, Redis, Protovis, etc.) are introduced to the reader, and used in appropriate contexts. That is, when they actually make the code easier to read, or simply more flexible in terms of scalability. I've learned quite a few tricks that are changing the way I work with data (and not just twitter data).
On the other hand, I really missed a short introduction to the main Twitter APIs. It's confusing to read about "statuses" or "timelines" without a prior formal definition. It took me quite some time to distill the appropriate information from the Twitter's developer documentation.
A must read if you are planning to work with Twitter data.