Building a simple but powerful recommendation system is much easier than you think. Approachable for all levels of expertise, this report explains innovations that make machine learning practical for business production settings—and demonstrates how even a small-scale development team can design an effective large-scale recommendation system.
Apache Mahout committers Ted Dunning and Ellen Friedman walk you through a design that relies on careful simplification. You’ll learn how to collect the right data, analyze it with an algorithm from the Mahout library, and then easily deploy the recommender using search technology, such as Apache Solr or Elasticsearch. Powerful and effective, this efficient combination does learning offline and delivers rapid response recommendations in real time.
Understand the tradeoffs between simple and complex recommenders
Collect user data that tracks user actions—rather than their ratings
Predict what a user wants based on behavior by others, using Mahoutfor co-occurrence analysis
Use search technology to offer recommendations in real time, complete with item metadata
Watch the recommender in action with a music service example
Improve your recommender with dithering, multimodal recommendation, and other techniques
Practical Machine Learning: Innovations in Recommendation
Ted Dunning is Chief Applications Architect at MapR Technologies and committer and PMC member of the Apache Mahout, ZooKeeper, and Drill projects and mentor for the Apache Storm, DataFu, Flink, and Optiq projects. He contributed to Mahout clustering, classification, and matrix decomposition algorithms and helped expand the new version of Mahout Math library. Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems, built fraud-detection systems for ID Analytics (LifeLock), and is the inventor of over 24 issued patents to date. Ted has a PhD in computing science from University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. Ted is on Twitter at @ted_dunning.
Ellen Friedman is a consultant and commentator, currently writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics including molecular biology, nontraditional inheritance, and oceanography. Ellen is also co-author of a book of magic-themed cartoons, A Rabbit Under the Hat. Ellen is on Twitter at @Ellen_Friedman.