Big Data Glossary
A Guide to the New Generation of Data Tools
Publisher: O'Reilly Media
Released: September 2011
Pages: 62

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.

This handy glossary also includes a chapter of key terms that help define many of these tool categories:

  • NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL
  • MapReduce—Tools that support distributed computing on large datasets
  • Storage—Technologies for storing data in a distributed way
  • Servers—Ways to rent computing power on remote machines
  • Processing—Tools for extracting valuable information from large datasets
  • Natural Language Processing—Methods for extracting information from human-created text
  • Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis
  • Visualization—Applications that present meaningful data graphically
  • Acquisition—Techniques for cleaning up messy public data sources
  • Serialization—Methods to convert data structure or object state into a storable format
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyBig Data Glossary
 
5.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (0)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Sort by

Displaying reviews 1-2

Back to top

(1 of 1 customers found this review helpful)

 
5.0

High Level Overview Well Done

By shawnday

from Dublin, Ireland

About Me Designer, Developer, Educator

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice
    • Student

    Comments about oreilly Big Data Glossary:

    The *Big* Data Gloss­ary is actu­ally a rel­at­ively *short* book, best enjoyed as an eBook in my estim­a­tion. This volume is sim­ilar to a num­ber of recent releases from O'Reilly that have moved from being deep and com­pre­hens­ive to provid­ing a higher-level taste-test over­view from a more con­cep­tual stand­point. In this instance, the Big Data Gloss­ary by Pete Warden could also be described as an annot­ated bib­li­o­graphy of the vari­ety of tools and plat­forms recently emerged to work with linked data or large and rich datasets.

    This gloss­ary moved through the basic ser­vices and com­pon­ents that could be employed to cre­ate a com­pre­hens­ive research envir­on­ment to con­duct data-mining or to cre­ate a deep visu­al­isa­tion for ana­lysis. The con­cise volume is designed to provide a con­text for fur­ther explor­a­tion of the vari­ous tools and ser­vices defined and offers use­ful links for such explor­a­tion. The anti­cip­ated audi­ence for this volume might be an aca­demic researcher new to the areas men­tioned or a developer trans­ition­ing from a more tra­di­tional data back­ground. Although brief the volume does much to draw together a qual­i­fied list of ser­vices and accom­plished much by identi­fy­ing the stronger cur­rent play­ers and sum­mar­iz­ing the strengths and weak­nesses of each. In this regard you might con­sider this book more of a tech­nical industry sur­vey. It is a valu­able wee tome for get­ting up to speed quickly with the play­ers and know­ing how you might judge ser­vices with in a par­tic­u­lar cat­egory as diverse as on-demand stor­age, data visu­al­isa­tion or nat­ural lan­guage pro­cessing. Much like Design­ing Data VIsu­al­isa­tions which I pre­vi­ously reviewed, this volume too could fit very nicely into an intro­duct­ory syl­labus and provide and excel­lent guide for an intro­duc­tion to data pro­cessing or digital research methodologies.

    I have no cri­ti­cisms of this book. It's short and con­cise and although you'd cer­tainly like more info, it does what it bills itself to do. And it does it well. It is the sort of book again that lends itself to an elec­tronic format as the con­tent by defin­i­tion is con­stantly chan­ging and evolving. If any­thing, the ways in which the vari­ous ser­vices are described tex­tu­ally prob­ably could be accom­plished in a tab­u­lar format which would facil­it­ate bet­ter cross-service eval­u­ation of fea­tures, strengths and weak­nesses, but that's what wiki­pe­dia is for. The descrip­tions here are brief enough that you will read through at least a chapter as whole (if not the entire volume) and come away with an informed under­stand­ing of a par­tic­u­lar space.

    I would recom­mend this book to any­one need­ing to quickly bring them­selves up to speed on the avail­able ser­vices in a spe­cific area of data pro­cessing, those wish­ing to keep cur­rent with emer­ging play­ers or those that are facing devel­op­ing require­ments doc­u­ments that may need to provide def­in­ite tech­no­lo­gical ref­er­ences (or for that mat­ter want to speak in real world terms about con­cep­tual solutions).

     
    5.0

    A good introduction to big data tools

    By Michele Milesi

    from Sorisole, BG - Italy

    About Me Designer, Developer

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Well-written

    Cons

    • Missing reference list
    • Too basic

    Best Uses

    • Novice
    • Student

    Comments about oreilly Big Data Glossary:

    The book is good starting point to who have to deal with big data and have no knowledge of the subject (like me).
    As a glossary is supposed to be, each term is not described in deep, but it reports some hints about similar tools and suggests when you may found useful explore that tool.

    Experienced people may found the description of a well know term too brief, but the glossary is so huge that they can found new tools to investigate.

    In my opinion the book lacks a complete references list, but a short internet search may set aside that defect.

    I have read this book within the O'Reilly Blogger Review Program, and O'Reilly gave me a free copy of the book.

    Displaying reviews 1-2

    Back to top

     
    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Immediate Access - Go Digital what's this?
    Ebook: $14.99
    Formats:  DAISY, ePub, Mobi, PDF
    Print & Ebook: $21.99
    Print: $19.99