Once you've accumulated a pile of data through your web application, what do you do with it? In this insightful video course, bit.ly lead scientist Hilary Mason shows you how to solve data analysis problems using basic machine learning techniques and frameworks. You'll follow several examples through the entire process—from obtaining, cleaning, and exploring data to building a model and interpreting the results.
Examine several real-world analysis solutions, including supervised learning and classification, unsupervised learning and clustering, and building common machine learning applications such as recommendation systems. If you're a developer interested in the math and processes necessary to apply machine learning techniques to web data, this video course is for you.
Intended Audience:
This class is intended for developers who are interested in an introduction to the math and processes to apply machine learning techniques to web data.
Introduction20 minutes
Classifying Web Documents - The Theory29 minutes
Classifying Web Documents - The Code47 minutes
Clustering, Recommendations, and Probability54 minutes
Conclusion11 minutes
Title:
Hilary Mason: An Introduction to Machine Learning with Web Data
Hilary Mason is the lead scientist at bit.ly, where she is finding sense in vast data sets. She is a former computer science professor with a background in machine learning and data mining, has published numerous academic papers, and regularly releases code on her personal site, www.hilarymason.com.
She has discovered two new species, loves to bake cookies, and asks way too many questions.
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
Incredibly useful to a novice. Wonderful production.
2/7/2012
5.0
Very Nice Introduction
By Gregory
from Seattle, WA
About Me Business Intelligence, Data Architect, Developer
Pros
Accurate
All code on GitHub
Easy to understand
Helpful examples
Well-written
Cons
Best Uses
Novice
Student
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
I definitely agree with Patrick, this is an excellent introduction. If you want to learn more, you really need to enroll in a Data Sciences program such as Univ. Washington's Applied Computation and Mathematical Sciences (ACMS) program.
I really enjoyed this lecture -- I was drawn to it after attending the STRATA 2011 Bootcamp where she presented similar material. She's a wonderful speaker and lecturer, and her examples were well thought out and presented. Moreover, the material is immediately usable -- I'm using what I learned from this lecture right now on a project.
(Be sure to snag the code off GitHub -- its pretty much the same Python code she uses in the lecture, a few minor changes. But you can use that to practice.)
Next to bottom line -- Not only would I recommend this to a friend, I just recommended it this very evening to a colleague so he can help me out with a project.
11/27/2011
(3 of 3 customers found this review helpful)
5.0
The title is INTRODUCTION
By Patrick The Developer
from Temple Hills, MD
About Me Designer, Developer, Sys Admin
Pros
Concise
Easy to understand
Helpful examples
Cons
Best Uses
Novice
Student
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
I am so tired of reading all of these comments such as "too basic", "not comprehensive enough", etc. The title says INTRODUCTION! By that title alone it should be apparent that it is not meant to be comprehensive and anything but basic. I grow tired of people who put in reviews simply to display how much smarter they are than the people authoring the book or teaching in a video.
If you want an introduction to machine learning with web data I don't think you can do too much better than this. If you are a CS Phd candidate specializing in data, you will probably find it too basic or not comprehensive enough.
"If you can't explain it to a six year old, you don't understand it yourself." ― Albert Einstein
11/7/2011
(1 of 1 customers found this review helpful)
4.0
Machine learning tech with web contents
By hu
from Tokyo Japan
About Me Developer
Pros
Accurate
Concise
Easy to understand
Helpful examples
Cons
Not comprehensive enough
Best Uses
Intermediate
Novice
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
This video targets to beginners of machine learning who would like to take a look what machine learning is and what application can be created with the technology. The video starts from an explanation (30 minutes) of a basic machine learning method "Bayzes' theorem" and an application sample code (50 minutes) treating New York Times contents to be classified into 'sports' or 'politics' and so on. The explanation is done through the author's explanation and the questions from the students. Concerning the basic machine learning session, it was easy to understand even for beginners because the explanation is organized well. Concerning the application session, first we need to have enough knowledge for basic grammar of python because of no explanation for python. However, the sample code is very simple and doesn't use a standard library at all. Main parts of the sample code are explained precisely, so it is also easy to grasp what each part does. And finally the author introduced public web services which can be used for a classification of the machine learning. Totally, we can take a look a machine learning with web content in short time. In my feeling, some people who already know related technologies may be boring because I learned the machine learning and felt the content as expected. However, the video provides a nice start for apprentices.
8/5/2011
4.0
Great introduction to machine learning
By David
from France
About Me Developer, Sys Admin
Pros
Accurate
Concise
Easy to understand
Helpful examples
Cons
Best Uses
Novice
Student
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
Hilary Mason video is a great introduction to machine learning with web data. It covers both supervised and unsupervised with lot of examples run during the class and live tweaking to improve algorithm understanding.
The video format, although slower to assimilate, provides a visual feedback on code running results and is a great alternative to books for introduction classes. If you want to follow up on the subject "Programming Collective Intelligence" is your next read.
7/17/2011
5.0
Excellent intro to machine learning
By Wilson Leoputra
from Perth, Australia
About Me Educator
Pros
Accurate
Concise
Easy to understand
Helpful examples
Cons
Too basic
Best Uses
Intermediate
Novice
Student
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
This video presents a comprehensive introduction to machine learning, covering topics on web data extraction, feature extraction, supervised and unsupervised learning algorithms for classification problem with examples and implementation codes. In this video, Hilary Mason showed a few useful tools to extract and process web document data including NYTimes API, curl, WordNet, and wordnik.com. She also explained and ran through the implementation codes in details for both supervised learning (Naive Bayes) and unsupervised learning (clustering) algorithms to solve classification problems. Different feature models are discussed in brief such as stemming, phrase n-gram bigram, and trigram along with strategies on how to deal with large data. With such easy-to-follow and comprehensive contents, this video is definitely useful for those who are interested to learn and apply machine learning techniques to web or text data.
7/2/2011
4.0
makes easy to play with data from web
By Anil
from Seattle, WA
About Me Developer
Pros
Easy to understand
Helpful examples
Cons
Not comprehensive enough
Best Uses
Novice
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
The class plays with delicious tags, content from nytimes API and introduces you to ways of getting sample data easily available on the internet for analysis. As the title suggests, this is an introductory video series enough to get started with. Some machine learning algorithms and use cases for these to apply on real world data are just introduced.
At the end, it will help you understand where machine learning is applied on various internet services and will definitely create enough curiosity to study further on this topic.
6/27/2011
5.0
Machine learning is way cool.
By Gregory Zentkovich
from Honolulu
About Me Developer, Sys Admin
Pros
Accurate
Concise
Easy to understand
Helpful examples
Cons
Best Uses
Intermediate
Novice
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
I really enjoyed this video series on Machine Learning. At first I didn't know what to expect, but I felt Hillary did a great job in explaining what machine learning is and how it can be used effectively analyze data extracted from web content. If fact, after the first couple of videos we were already jumping right into retrieving data from a live site using some of her own home brewed code. It was really cool. Of course, if you are uncomfortable with working on the command line, it could be intimidating at first, but I felt it was simple enough to follow along (after all you can pause the video and go some googling if you really get stuck). I believe, by having us interact with the live data she made this video course on machine learning exponentially more exciting. Hillary also has a very unique way of communicating with her audience, and she just exudes passion in her field. Not monotone dialog here, she gets you excited wanting to learn more. I also felt the classroom setting with six other people asking questions along the way also help the learning process. The best part though, is because its a video, you can stop, rewind and hear complex content over and over again until you get it. All in all, it was a great video and I look forward to watching future releases on machine learning by Hillary.
6/25/2011
4.0
Learning About Learning
By ktabors
from Oakland, CA
About Me Developer
Pros
Informative
Cons
Best Uses
Novice
Student
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
I was introduced to Hilary Mason when my wife told me there was a female computer scientist from bit.ly talking on a radio show. She likes to find things for me. :) That is why I wanted to get this, along with not knowing machine learning.
I am better informed about machine learning because this was a great introduction to the topic. It's already been useful for understanding and reviewing articles and code while researching a work project. It's done using Python. Code is provided which she explains and teaches to use and modify. Her primary examples are trained algorithm (figuring out is something belongs to one set or data or another, like recommendations) and the algorithm that looks at a data set and determines clusters. It all seems very practical and relevant. I would have liked being walked through writing the code from scratch, but that would be the book version of this. :) Also to see more slides or code, there was a lot of focus on people talking. Having an unintroduced participatory audience was weird.
I received this free through the O'Reilly Blogger Review Program.
6/16/2011
5.0
A Good Intro for the Software Dev
By m2web
from Erlanger, KY
About Me Designer, Developer, Educator
Pros
Accurate
Concise
Easy to understand
Helpful examples
Cons
Best Uses
Intermediate
Comments about O'Reilly Media Hilary Mason: An Introduction to Machine Learning with Web Data:
The video itself is presented in five sections: (1) Introduction, (2) Classifying Web Documents - The Theory, (3) Classifying Web Documents - The Code, (4) Clustering , Recommendations, and Probability, and (5) Conclusion. In short, the video Hilary uses web based data to show the audience how to work with data to solve problems you may have by using basic machine learning techniques. The video is particulary directed at programmers who do not have statistical training.
The viewer will sit with a group of a few other students and feel the imtimate setting of a small classroom. For myself, a video where you can re-watch segments, stop the video to reference a suggested resource, or pause to experiment with a variant of the code is both helpful and handy. For example, in the introduction, Hilary references a link (http://bit.ly/9RYQEF) that explans the concept of "data science." At that point, I paused the video and browsed to the link and found it quiet informative. By the second section, Classifying Web Documents - The Theory, the audience is gently taken into statistical techniques such as naive bayes and shown a step-by-step approach in how the math is applied. In the Classifying Web Documents - The Code, the participant utilizes python code and the New York Time API to classify words from the New York Times web site. Within the Clustering, Recommendations, and Probability video the viewer is taken through code that demonstrates how to take data with which little is known and learn from the clustering results. Finally, the conclusion section deals briefly with the concepts of probability and then reviews the entire sessions content.
While being able to navigate around in python is beneficial and by following along with the running of the code one can learn and retain more information, the participant can just view the video content as both the code and concepts are displayed and explained. What is nice is Hilary provides the code used in the video from her Git repository at https://github.com/hmason/ml_class. If the viewer wants to participate she will need to make sure that they have the proper python modules installed.
In conclusion, the software developer that has little more than the required stats college class would do well to purchase this video. Seeing the actual application of code to the basic statistical algorithms is extremely informative and applicable in various problem domains.