If you’re an experienced programmer willing to crunch data, this concise guide will show you how to use machine learning to work with email. You’ll learn how to write algorithms that automatically sort and redirect email based on statistical patterns. Authors Drew Conway and John Myles White approach the process in a practical fashion, using a case-study driven approach rather than a traditional math-heavy presentation.
This book also includes a short tutorial on using the popular R language to manipulate and analyze data. You’ll get clear examples for analyzing sample data and writing machine learning programs with R.
Mine email content with R functions, using a collection of sample files
Analyze the data and use the results to write a Bayesian spam classifier
Rank email by importance, using factors such as thread activity
Use your email ranking analysis to write a priority inbox program
Test your classifier and priority inbox with a separate email sample set
Chapter 1 Using R
R for Machine Learning
Further Reading on R
Chapter 2 Data Exploration
Exploration vs. Confirmation
What is Data?
Inferring the Types of Columns in Your Data
Means, Medians, and Modes
Standard Deviations and Variances
Exploratory Data Visualization
Visualizing the Relationships between Columns
Chapter 3 Classification: Spam Filtering
This or That: Binary Classification
Moving Gently into Conditional Probability
Writing Our First Bayesian Spam Classifier
Chapter 4 Ranking: Priority Inbox
How Do You Sort Something When You Don’t Know the Order?
Drew Conway is a PhD candidate in Politics at NYU. He studies international relations, conflict, and terrorism using the tools of mathematics, statistics, and computer science in an attempt to gain a deeper understanding of these phenomena. His academic curiosity is informed by his years as an analyst in the U.S. intelligence and defense communities.
John Myles White is a Ph.D. student in the Princeton Psychology Department, where he studies how humans make decisions both theoretically and experimentally. Outside of academia, John has been heavily involved in the data science movement, which has pushed for an open source software approach to data analysis. He is also the leadmaintainer for several popular R packages, including ProjectTemplate and log4r.
Comments about oreilly Machine Learning for Email:
Thanks to R programming language, the reader could concentrate on the main purpose to understand the core procedures related to machine learning. Because the author explains the codes precisely also, the readers could understand the technologies clearly even if they can't understand some part of the codes. Regarding the introduced machine learning methods, they are just basic statistical methods. Some of the readers having experience working with machine learning prior may feel a little tired. However, the introduced approach is enough to classify spam and ham. In addition to the classification of spam and ham, this book introduced a way how to rank emails with many practical idea. - if a period an user sends the response after viewing is short, it would be important email for him. - if a period an user interacts with a thread is long, the thread would be important for him. Therefore, the terms included in the thread are ranked as high. Through the book, because the sample codes use practical sample email data which can be obtained from the web, the introduced machine learning methods address practical use case though simple.
Bottom Line Yes, I would recommend this to a friend