Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.
Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.
Create analytics applications by using the agile big data development methodology
Build value from your data in a series of agile sprints, using the data-value stack
Gain insight by using several data structures to extract multiple features from a single dataset
Visualize data with charts, and expose different aspects through interactive reports
Use historical data to predict the future, and translate predictions into action
Get feedback from users after each sprint to keep your project on track
Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.
The animal on the cover of Agile Data Science is a silvery marmoset (Mico argentatus). These small New World monkeys live in the eastern parts of the Amazon rainforest and Brazil. Despite their name, silvery marmosets can range in color from near-white to dark brown. Brown marmosets have hairless ears and faces and are sometimes referred to as bare-ear marmosets. Reaching an average size of 22 cm, marmosets are about the size of squirrels, which makes their travel through tree canopies and dense vegetation very easy.Silvery marmosets live in extended families of around twelve, where all the members help care for the young. Marmoset fathers carry their infants around during the day and return them to the mother every two to three hours to be fed. Babies wean from their mother’s milk at around six months and full maturity is reached at one to two years old.The marmoset’s diet consists mainly of sap and tree gum. They use their sharp teeth to gouge holes in trees to reach the sap, and will occasionally eat fruit, leaves, and insects as well. As the deforestation of the rainforest continues, however, marmosets have begun to eat food crops grown by people; as a result, many farmers view them as pests. Large-scale extermination programs are underway in agricultural areas, and it is still unclear what impact this will have on the overall silvery marmoset population.Because of their small size and mild disposition, marmosets are regularly used as subjects of medical research. Studies on the fertilization, placental development, and embryonic stem cells of marmosets may reveal the causes of developmental problems and genetic disorders in humans. Outside of the lab, marmosets are popular at zoos because they are diurnal (active during daytime) and full of energy; their long claws mean they can quickly move around in trees, and both males and females communicate with loud vocalizations.
I'm really enjoying going through this big data tutorial and learning much.
Interestingly I've toyed with nearly all the technologies being used and thought I understood the value of big data. I even have some map-reduce analytic jobs running to provide real value.
This book made the 'agile' part click and made me look at my analytic workflow like any other software process. Just like I focus on optimizing my tooling for automation/compiling/testing applications I see how easy it could be to have a similar workflow to BI.
I like the writing style and the pace. He calls out some common traps while not spending too much time going into installation and tool details best left to the project websites.
I'd like to see a part II of this where these techniques are blended with SQL data and maybe data warehouses.
Bottom Line Yes, I would recommend this to a friend