Programming Collective Intelligence
Building Smart Web 2.0 Applications
Publisher: O'Reilly Media
Final Release Date: August 2007
Pages: 362

Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.

Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:

  • Collaborative filtering techniques that enable online retailers to recommend products or media
  • Methods of clustering to detect groups of similar items in a large dataset
  • Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm
  • Optimization algorithms that search millions of possible solutions to a problem and choose the best one
  • Bayesian filtering, used in spam filters for classifying documents based on word types and other features
  • Using decision trees not only to make predictions, but to model the way decisions are made
  • Predicting numerical values rather than classifications to build price models
  • Support vector machines to match people in online dating sites
  • Non-negative matrix factorization to find the independent features in a dataset
  • Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game
Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you.

"Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."
-- Dan Russell, Google

"Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."
-- Tim Wolters, CTO, Collective Intellect
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyProgramming Collective Intelligence
 
4.4

(based on 17 reviews)

Ratings Distribution

  • 5 Stars

     

    (9)

  • 4 Stars

     

    (5)

  • 3 Stars

     

    (3)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Concise (3)
  • Easy to understand (3)
  • Well-written (3)

Cons

    Best Uses

    • Intermediate (3)
      • Reviewer Profile:
      • Developer (4)

    Reviewed by 17 customers

    Sort by

    Displaying reviews 1-10

    Back to top

    Previous | Next »

    (1 of 1 customers found this review helpful)

     
    3.0

    Errors all over

    By mlt

    from las vegas, NV

    About Me Developer

    Verified Reviewer

    Pros

    • Easy to understand

    Cons

    • Too many errors

    Best Uses

    • Intermediate
    • Novice
    • Student

    Comments about oreilly Programming Collective Intelligence:

    This book has great content. It honestly seems like a great introduction to machine learning. The reason I gave it 3 stars is:

    - As of 5/19/2014 none of the "kiwitobes.com" links in the book work. This leaves you stranded and looking for files all over for some chapters. Thankfully people indexed some content via github.

    - There is a lot of errata. Keep the unofficial errata handy and check when your code doesn't do something right.

    - There are some code errors

    - Many examples leave you hanging on what they do if you're not a python expert. Code commenting is abysmal.

    All in all, it seems like this book is good content wise but very poor in terms of execution. The fact the author took down everything I've seen so far on kiwitobes.com is really disheartening. If you're resourceful this book is good. If you're looking for a no-frills follow and read book everything I listed above will make this book a nightmare for you.

     
    5.0

    Concise and Informative

    By jesserosato

    from Sacramento, CA

    About Me Developer

    Verified Buyer

    Pros

    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Student

      Comments about oreilly Programming Collective Intelligence:

      This book is a super helpful introduction to some complicated topics with clear examples and applications.

       
      5.0

      Great book on Machine Learning

      By Blaize

      from Fairfax, VA

      About Me Developer

      Verified Buyer

      Pros

      • Accurate
      • Concise
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert

        Comments about oreilly Programming Collective Intelligence:

        If you need to understand the inner workings of Machine Learning algorithms this is the book.

        (8 of 8 customers found this review helpful)

         
        4.0

        Recommended, despite the code issues

        By zsoldosp

        from Germany

        About Me Developer

        Verified Reviewer

        Pros

        • Concise
        • Easy to understand
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice

          Comments about oreilly Programming Collective Intelligence:

          Disclaimer: I received a free (electronic) copy of this ebook (Programming Collective Intelligence by Toby Segaran) from O'Reilly as part of the O'Reilly Blogger Review Program, which also requires me to write a review about it. That aside, I would have purchased this book this year anyway, and would have reviewed it on this blog too.

          About me and why I read this book

          I've been programming professionally for ~7.5 years, mainly business applications and reporting, so I already have quite some love for data. While I haven't used math much in my day jobs, I liked (and was good at) it in high school, including taking extra classes - so I have learned basic statistics. Refreshing and advancing my data analytics skills is one of my goals this year, and reading this book was part of the plan.

          About the book

          The book introduces lots of algorithms that can be used to gain new insight into any kind of data one might come across. The explanations are broken up into digestible chunks, and are supported by great visualizations. While understanding of the previous chunks is required for the later ones, this allowed me to read through most of the book on the train to and from work.

          Each of the algorithms is illustrated with real world application examples, and examples where applying them doesn't make sense are brought too. The exercises at the end of the chapters are applied and not purely theoretical - and coming up with exercises from the domain I work with every day was pretty easy! The book is really inspiring, which is great for an introductory book!

          In addition to the well written, gradual introduction, the book has a concise algorithm reference at the end, so when one needs a quick refresher, there is no need to wade through the lengthy tutorials.

          While the prose and the logic of the explanations are great, I have found the code samples hard to follow: really short, cryptic variable names; leaky abstractions; inconsistent coding style just to name a few. Some code samples are actually incorrect implementations of the given algorithm and there are antipatterns like string sql concatenation in the code without a warning comment to the reader to remind them it's a bad practice.

          Nonetheless, it is great to have actual code to play with, just the initial reading and reviewing of it requires some extra effort.

          The book claims that you don't need previous Python knowledge to understand the code samples, which I can't confirm (I use Python at my day job), but I wouldn't be surprised if not knowing Python could make understanding the code even more difficult (I've actually learned a few new language features from the samples!). Also, the Python language has come a long way since 2.4, which is the version used in the book - and that old version makes the code feel dated.

          The book was written in 2007, but is not dated. First, the foundations of any topic tend to be timeless, and the most recent algorithm the book describes was published in 1990. The Table of Contents is comparable to more recently written ones (though I haven't read other introductory books yet).

          In summary: I would recommend it as a great introductory book!

          (15 of 16 customers found this review helpful)

           
          5.0

          The Python way to collective intelligence

          By dwa

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          Programming Collective Intelligence is a new book from O'Reilly, which was written by Toby Segaran. The author graduated from MIT and is currently working at Metaweb Technologies. He develops ways to put large public datasets into Freebase, a free online semantic database. You can find more information about him on his blog: http://blog.kiwitobes.com/.

          Web 2.0 cannot exist without Collective Intelligence. The "giants" use it everywhere, YouTube recommends similar movies, Last.fm knows what would you like to listen and Flickr which photos are your favorites etc. This technology empowers intelligent search, clustering, building price models and ranking on the web. I cannot imagine modern service without data analysis. That is the reason why it is worth to start read about it.

          There are many titles about collective intelligence but recently I have read two, this one and "Collective Intelligence in Action". Both are very pragmatic, but the O'Railly's one is more focused on the merit of the CI. The code listings are much shorter (but examples are written in Python, so that was easy). In general these books comparison is like Java vs. Python. If you would like to build recommendation engine "in Action"/Java way, you would have to read whole book, attach extra jar-s and design dozens of classes. The rapid Python way requires reading only 15 pages and voila, you have got the first recommendations. It is awesome!

          So how about rest of the book, there are still 319 pages! Further chapters say about: discovering groups, searching, ranking, optimization, document filtering, decision trees, price models or genetic algorithms. The book explains how to implement Simulated Annealing, k-Nearest Neighbors, Bayesian Classifier and many more. Take a look at the table of contents (here: http://oreilly.com/catalog/9780596529321/preview.html), it does not list all the algorithms but you can find more information there.

          Each chapter has about 20-30 pages. You do not have to read them all, you can choose the most important and still know what is going on. Every chapter contains minimum amount of theoretical introduction, for total beginners it might be not enough. I recommend this book for students who had statistics course (not only IT or computing science), this book will show you how to use your knowledge in practice _ there are many inspiring examples.

          For those who do not know Python - do not be afraid _ at the beginning you will find short introduction to language syntax. All listings are very short and well described by the author _ sometimes line by line. The book also contains necessary information about basic standard libraries responsible for xml processing or web pages downloading.

          If you would like to start learn about collective intelligence I would strongly recommend reading "Programming Collective Intelligence" first, then "Collective Intelligence in Action". The first one shows how easy it is to implement basic algorithms, the second one would show you how to use existing open source projects related to machine learning.

          You can find more about this book on it's catalogue page: http://oreilly.com/catalog/9780596529321/

          (3 of 3 customers found this review helpful)

           
          4.0

          A fascinating read with lots of code examples

          By www.thegeniusfiles.com

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          If you are a computer science student and want to learn about the algorithms and theory behind Web 2.0, this is a good place to start. Although the author tries to make the book intelligible to novices, you will benefit by having some previous programming experience - say, perhaps up to the 300 level. Also, the code is Python, so you might want to study up on that first. In my opinion a good Linux distro like Ubuntu will simplify the coding experience (it's easier to download and install the Python libraries in Ubuntu Synaptic than to install them in Windows).

          The really nice thing about this book is that the author explains the principle of what each code example is doing before launching into the code. That's important because much of it is grounded in methods of statistical analysis.

          As another reviewer pointed out, there are some errors in some of the code examples. If you have no prior experience with Python, this would be very confusing. However, you can access the revisions through Safari Online, so all is not lost.

          If it weren't for the code errors, I'd give this book 5 out of 5.

          (5 of 5 customers found this review helpful)

           
          5.0

          A visionary book that illuminates the Internet

          By AlexeySmirnov

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          This is a visionary book because it predicts a lot of what will happen to the Internet soon. How do we process information in the Internet age? Instead of reading magazines and newspapers we use blogs as our source of news. This is because blogs offer much more customized news feed. In a typical newspaper, how much of its content is of interest to a reader? I guess half is a big value but typically it is less than that.

          I start my working day with consuming two sweet drinks. One drink is a cup of coffee. Another is a virtual information soup made of 100 blogs. I glance over most of the stories quickly using Google Reader and select those that I am interested in. I might read them in greater detail later on during the day, in the evening, or on a weekend. I do not know which drink gives me more pleasure - the delicious cup of coffee or sweet virtual soup. I like the latter a lot because it is rich with media content - with bright images, cool videos, wow-type web pages.

          However, I often discover news that I wish I found out earlier. In other words, there are so many news sources that reading them all or just looking at the headlines of major blogs will take too much time. We need targeted information delivery service.

          This is the main idea of this book. In fact, it starts with explaining how to make recommendations given a set of preferences of a number of people and your own preferences. What are those cool things that you have not tried out yet but everybody else did? The example described in the book is applied to Delicious which does not offer recommendations yet.

          I often try to decide what my interests are. The blogs that I am reading might answer this question if one builds groups of them. In fact, I have done this manually, but I found out that this categorization is not perfect. The book answers this question in Chapter 3.

          After that the book deviates into a number of additional topics such as search, neural networks, discrete optimization. The author Toby Segaran has a great ability to explain difficult concepts using simple words and pictures. As most of the stuff was familiar to me I was wondering how easy a new concept seemed and how much time I spent originally understanding it.

          After that the main melody of the book is there again - the next chapter explains how to filter documents, for example to decide if a particular news story is interesting to you or not. Then the book deviates again into decision trees and building price models and even matching people on a dating site. However, there comes our melody again - this time it explains how to extract trends from a lot of news sources, that is decide what people are discussing today. This feature is similar to Google News except that the user has no control of news sources.

          I was surprised when I found out that Python is such a popular language in a scientific community. The book describes lots of libraries dealing with numerical data or displaying various charts. The book will serve as a great introduction to Python language even though there are lots of introductory books available. In fact, learning Python this way it easier and more enjoyable.

          After reading the book I definitely want to try out the tricks explained there and improve my information soup. This book is my virtual cookbook.

          (3 of 7 customers found this review helpful)

           
          3.0

          Good book, bad code.

          By Anonymous

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          Pretty interesting book, definitively worth a read - at least for the self-taught guys like me. Too bad the code examples are of such a low quality. Prepare for a complete rewrite if you planned on using them (M. Segaran, we'd be happy to see you submitting your code for review on comp.lang.py !-)

          Now don't take me wrong : it's still one of the very few CS-related books I didn't regret to buy.

          (3 of 4 customers found this review helpful)

           
          3.0

          Information is great, but too many errors

          By Amit Lamba

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          I've just purchased the book and I don't doubt that the book is full of amazing information on the topic of Collective Intelligence. But, without even getting past the Preface, I've already discovered 2 errors in actual code examples! From there I decided to check the errata list here on O'reilly, and albeit a lot of the errors are unconfirmed, but it seems alarmingly high. Errata is to be expected, but with a book that deals so heavily in mathematical formulae and code snippets, it's a pain to have to cross check everything. There should have been a better job on proof reading this before it was pushed out to the masses. I have the August 2007 version of the book, so if there is a 03/2008 printing of this book, I'd much rather get that version if the errata have been fixed in the newer printing. If someone can confirm this, I'd gladly revise my review of the book, if that's even possible. Outside of these errors, this book would easily be a 5 star book.

           
          5.0

          Very informative, engaging read. Code a bit terse

          By Jeff

          from Undisclosed

          Comments about oreilly Programming Collective Intelligence:

          Everything that everyone else has said about how well written this book is, how applicable the examples are, etc, is spot on. It's a very engaging read.I have a single request. Could someone make a downloadable version of the code available with more descriptive variable names. (Terse is fine for the book itself). I'm finding I'm having to rename variables as I go so that I can more easily grasp the math.An example of one that I've renamed:def sim_pearson(prefs,p1,p2): # Get the list of mutually rated items mutuallyRatedItems={} for item in prefs[p1]: if item in prefs[p2]: mutuallyRatedItems[item]=1 # if they are no ratings in common, return 0 if len(mutuallyRatedItems)==0: return 0 # Sum calculations numMutuallyRatedItems=len(mutuallyRatedItems) # Sums of all the preferences sum_Person1_MutuallRatings=sum([prefs[p1][item] for item in mutuallyRatedItems]) sum_Person2_MutuallRatings=sum([prefs[p2][item] for item in mutuallyRatedItems]) # Sums of the squares sumOfSquaresOfPerson1MutualRatings=sum([pow(prefs[p1][item],2) for item in mutuallyRatedItems]) sumOfSquaresOfPerson2MutualRatings=sum([pow(prefs[p2][item],2) for item in mutuallyRatedItems]) # Sum of the products sum_ProductOf_Ratings_OfBothUsers_MutualItems=sum([prefs[p1][item]*prefs[p2][item] for item in mutuallyRatedItems]) # Calculate r (Pearson score) numerator =sum_ProductOf_Ratings_OfBothUsers_MutualItems-(sum_Person1_MutuallRatings*sum_Person2_MutuallRatings/numMutuallyRatedItems) denominator=sqrt((sumOfSquaresOfPerson1MutualRatings-pow(sum_Person1_MutuallRatings,2)/numMutuallyRatedItems)*(sumOfSquaresOfPerson2MutualRatings-pow(sum_Person2_MutuallRatings,2)/numMutuallyRatedItems)) if denominator==0: return 0 pearsonCorrelation=numerator/denominator return pearsonCorrelationGreat book. I'm really enjoying it.

          Displaying reviews 1-10

          Back to top

          Previous | Next »

           
          Buy 2 Get 1 Free Free Shipping Guarantee
          Buying Options
          Immediate Access - Go Digital what's this?
          Ebook: $33.99
          Formats:  APK, DAISY, ePub, Mobi, PDF
          Print & Ebook: $43.99
          Print: $39.99