Getting Started with Impala
Interactive SQL for Apache Hadoop
Publisher: O'Reilly Media
Final Release Date: September 2014
Pages: 110

Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities.

Ideal for database developers and business analysts, Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers.

  • Learn how Impala integrates with a wide range of Hadoop components
  • Attain high performance and scalability for huge data sets on production clusters
  • Explore common developer tasks, such as porting code to Impala and optimizing performance
  • Use tutorials for working with billion-row tables, date- and time-based values, and other techniques
  • Learn how to transition from rigid schemas to a flexible model that evolves as needs change
  • Take a deep dive into joins and the roles of statistics
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyGetting Started with Impala
 
4.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (0)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Sort by

Displaying reviews 1-2

Back to top

(1 of 1 customers found this review helpful)

 
4.0

An excellent reference and an eye opener

By ArthurZ

from Toronto, Canada

About Me Developer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

  • Not comprehensive enough
  • Too basic

Best Uses

  • Novice
  • Student

Comments about oreilly Getting Started with Impala:

Impala is a recent, but very valuable addition to the Hadoop ecosystem. I must say (after reading the book) Cloudera made a big step forward in the right direction.

The rational behind bringing Impala to life is the proliferation of SQL. SQL as a language has many flavours, but in one form or another is already known to data practitioners coming to Hadoop from various platforms and DBMS. Impala implements a subset of ANSI-92 SQL specification, regardless, even the subset is powerful enough to make a developer productive. In my opinion, since SQL it is based on algebra and sets, and because HDFS (Hadoop) is just able to expose datasets Impala is the right choice for MDL and DDL even for the Big Data projects.

At 110 pages the book is not terribly long, but bear in mind Impala as a product is still under active development, as a bonus, the author has a close relationship with the product working at Cloudera, this is a big plus resulting in top professional content. John structured the book so it is basically divided into two parts: 1st and the largest is on Impala implementation and its role in data analysis and processing, the 2nd part covers most commonly used tasks, pitfalls or simply advice and techniques.

What I did not find is more on how to use it with Hive, Scoop, HBase and Pig, I will take a star out of my rating for this.

Let me reiterate, the book covers the Cloudera's Hadoop Impala distribution, if you are using a different distribution, Impala is not part of it.

Like I said, I am giving this book a 4 out of 5 stars. Good work John!

Disclaimer: the book was provided to me for free as part of O'Reilly's blogger reviewer programme.

(1 of 1 customers found this review helpful)

 
4.0

Finally some good Impala documentation!

By codingtony

from Montreal, QC, Canada

About Me Developer

Verified Buyer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

  • Could Have More Material

Best Uses

  • Intermediate
  • Novice
  • Student

Comments about oreilly Getting Started with Impala:

Before reading the book, I already had some experience with Impala (1.3 & 1.4).

I really enjoyed the book. I liked the examples. It gave me some new ideas on how to tackle problems, such as using view to quickly create partitioned tables.

It is a quick read. (~100 pages)

If there's only one con, I would say that the book could have gone further and add more topics.

Displaying reviews 1-2

Back to top

 
Buy 2 Get 1 Free Free Shipping Guarantee
Buying Options
Immediate Access - Go Digital what's this?
Ebook: $25.99
Formats:  DAISY, ePub, Mobi, PDF
Print & Ebook: $32.99
Print: $29.99