To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.
This handy glossary also includes a chapter of key terms that help define many of these tool categories:
NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL
MapReduce—Tools that support distributed computing on large datasets
Storage—Technologies for storing data in a distributed way
Servers—Ways to rent computing power on remote machines
Processing—Tools for extracting valuable information from large datasets
Natural Language Processing—Methods for extracting information from human-created text
Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis
Visualization—Applications that present meaningful data graphically
Acquisition—Techniques for cleaning up messy public data sources
Serialization—Methods to convert data structure or object state into a storable format
A quick overview of the latest big data technologies.
Comments about oreilly Big Data Glossary:
I was kinda skeptical when I first had my hands on these 60 pages and, after getting through them (that did not take so much), I'm still kinda puzzled. I did not get confused by the content, no. I guess if you decide to read this book, you must know what to expect from it, else you will end up pretty much disgusted with both the money and time wasted.
Let's make it clear: this book doesn't teach you anything. After thirty minutes or so, when you reach the back cover, you will not have learned anything. But chances are you will open Iceweasel, or whatever your favorite browser is, and go search more information about some of the tools the author described.
And that's the one and only aim of the author: to make you wanna know more about some specific application that you were not aware of. I did search something indeed, so that, in that sense, mission accomplished.
Now, would I suggest reading the book? Yea, why not. You can always find out someone is developing something that could be useful to you.
Would I suggest buying the book. No. For a couple of reasons:
It's not worth the money. This book should not be a book, but rather some kind of weekly newsletter or better, O'Really should make sure that Amazon and any other book store, gives yo ua ocpy of this title whenever you purchase an IT book, to show you the latest technologies and tell you hey, we've got a book covering that subject! The book is outdated already. It's from 2011. Technology advances so fast that what was hot and cool four years ago now has probably been replaced by something else.
As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com Feel free to pass by and share your thoughts!
Bottom Line No, I would not recommend this to a friend
The *Big* Data Glossary is actually a relatively *short* book, best enjoyed as an eBook in my estimation. This volume is similar to a number of recent releases from O'Reilly that have moved from being deep and comprehensive to providing a higher-level taste-test overview from a more conceptual standpoint. In this instance, the Big Data Glossary by Pete Warden could also be described as an annotated bibliography of the variety of tools and platforms recently emerged to work with linked data or large and rich datasets.
This glossary moved through the basic services and components that could be employed to create a comprehensive research environment to conduct data-mining or to create a deep visualisation for analysis. The concise volume is designed to provide a context for further exploration of the various tools and services defined and offers useful links for such exploration. The anticipated audience for this volume might be an academic researcher new to the areas mentioned or a developer transitioning from a more traditional data background. Although brief the volume does much to draw together a qualified list of services and accomplished much by identifying the stronger current players and summarizing the strengths and weaknesses of each. In this regard you might consider this book more of a technical industry survey. It is a valuable wee tome for getting up to speed quickly with the players and knowing how you might judge services with in a particular category as diverse as on-demand storage, data visualisation or natural language processing. Much like Designing Data VIsualisations which I previously reviewed, this volume too could fit very nicely into an introductory syllabus and provide and excellent guide for an introduction to data processing or digital research methodologies.
I have no criticisms of this book. It's short and concise and although you'd certainly like more info, it does what it bills itself to do. And it does it well. It is the sort of book again that lends itself to an electronic format as the content by definition is constantly changing and evolving. If anything, the ways in which the various services are described textually probably could be accomplished in a tabular format which would facilitate better cross-service evaluation of features, strengths and weaknesses, but that's what wikipedia is for. The descriptions here are brief enough that you will read through at least a chapter as whole (if not the entire volume) and come away with an informed understanding of a particular space.
I would recommend this book to anyone needing to quickly bring themselves up to speed on the available services in a specific area of data processing, those wishing to keep current with emerging players or those that are facing developing requirements documents that may need to provide definite technological references (or for that matter want to speak in real world terms about conceptual solutions).
Bottom Line Yes, I would recommend this to a friend
The book is good starting point to who have to deal with big data and have no knowledge of the subject (like me). As a glossary is supposed to be, each term is not described in deep, but it reports some hints about similar tools and suggests when you may found useful explore that tool.
Experienced people may found the description of a well know term too brief, but the glossary is so huge that they can found new tools to investigate.
In my opinion the book lacks a complete references list, but a short internet search may set aside that defect.
I have read this book within the O'Reilly Blogger Review Program, and O'Reilly gave me a free copy of the book.
Bottom Line Yes, I would recommend this to a friend