How do you use R to import, manage, visualize, and analyze real-world data? With this short, hands-on tutorial, you learn how to collect online data, massage it into a reasonable form, and work with it using R facilities to interact with web servers, parse HTML and XML, and more. Rather than use canned sample data, you'll plot and analyze current home foreclosure auctions in Philadelphia.
This practical mashup exercise shows you how to access spatial data in several formats locally and over the Web to produce a map of home foreclosures. It's an excellent way to explore how the R environment works with R packages and performs statistical analysis.
Parse messy data from public foreclosure auction postings
Plot the data using R's PBSmapping package
Import US Census data to add context to foreclosure data
Use R's lattice and latticeExtra packages for data visualization
Create multidimensional correlation graphs with the pairs() scatterplot matrix package
Jeremy Leipzig is a bioinformatics software developer at DuPont Crop Genetics. He has conducted academic research in viral integration, metagenomics, schizophrenia, and alternative splicing. While a graduate student, he developed one of the first faculty-review websites and wrote "Work Issues in Software Engineering", a survey-based study of "death march" projects.
Xiao-Yi Li is a biostatistician with an M.Sc. from University of Michigan. In fact, her entire education experience has be revolving statistics, a percentile or otherwise. Currently, she works in the bioinformatics group at DuPont as a statistical consultant. Her work consists mostly of design of experiments and analysis for phenotypic screens, quality control in microarrays, and association mapping.
The Book Data Mashups in R gives a short tutorial using R to pull data from the web and analyse it. The specific presentation is looking at foreclosure auctions in Philadelphia. I used this book to get some practical hands on exposure to R. It happily fulfilled this use case for me, exposing packages in R, and helping me gain an idea of how R can be used. It also provided a good idea as to what R does well, helping me know when to pull the R tool out of the shed.
I recommend the book to experienced developers who want to get their heads around what is possible with R and how R works by trying it out.
[this book was reviewed as a part of the O'Reilly Blogger Review Program]
3/23/2011
(1 of 2 customers found this review helpful)
3.0
Binding data sources in R
By Michal Konrad Owsiak
from Poland
Pros
Accurate
Concise
Cons
Too condese
Best Uses
Expert
Intermediate
Comments about O'Reilly Media Data Mashups in R:
Have you ever wondered whether R can utilize regular expressions? Have you been forced to download data from particular source before you start using it within R? Or maybe you were not quite sure how to deal with XML within R scripts. Well, thats what Data Mashups are all about. Jeremy and Xiao-Yi show you how to deal with all these aspects. They show it in very condense way, but still, you can get the feeling what's R and scripting is all about. You will find here regular expressions, XML parsing, how to use PBSmapping package and description of how to combine all of this within single project.
The book is quite interesting – in terms of the topic. However, it looks little bit messy. I would expect that you get the idea of the problem we want to solve before you start solving it. Well, not this time. Jeremy and Xiao-Yi skip this part and jump straight into solution. It complicates the process of getting through the ideas presented in the book. I prefer to be offered problem before I start looking for a solution. Question here is – maybe for this kind of topic, essay is really enough. I don't know. I am still getting through R and it's "traps", and honestly, I choose other R related titles from O'Reilly over Data Mashups. If you start your adventure with R choose "R Cookbook" or "25 Recipes for Getting Started with R". If you are already familiar with R, and you want to go beyond what can be called standard, go ahead with Data Mashups.
Idea of the book is very good. The application, not the best one. I can value the solutions and the code snippets that are shown – you can always reuse them at some point in your own projects, however, the way everything is bound and presented doesn't quite appeal to me.