Books & Videos

Table of Contents

  1. Chapter 1 Introduction

    1. An Overview of Hadoop and MapReduce

    2. Hive in the Hadoop Ecosystem

    3. Java Versus Hive: The Word Count Algorithm

    4. What’s Next

  2. Chapter 2 Getting Started

    1. Installing a Preconfigured Virtual Machine

    2. Detailed Installation

    3. What Is Inside Hive?

    4. Starting Hive

    5. Configuring Your Hadoop Environment

    6. The Hive Command

    7. The Command-Line Interface

  3. Chapter 3 Data Types and File Formats

    1. Primitive Data Types

    2. Collection Data Types

    3. Text File Encoding of Data Values

    4. Schema on Read

  4. Chapter 4 HiveQL: Data Definition

    1. Databases in Hive

    2. Alter Database

    3. Creating Tables

    4. Partitioned, Managed Tables

    5. Dropping Tables

    6. Alter Table

  5. Chapter 5 HiveQL: Data Manipulation

    1. Loading Data into Managed Tables

    2. Inserting Data into Tables from Queries

    3. Creating Tables and Loading Them in One Query

    4. Exporting Data

  6. Chapter 6 HiveQL: Queries

    1. SELECT … FROM Clauses

    2. WHERE Clauses

    3. GROUP BY Clauses

    4. JOIN Statements

    5. ORDER BY and SORT BY

    6. DISTRIBUTE BY with SORT BY

    7. CLUSTER BY

    8. Casting

    9. Queries that Sample Data

    10. UNION ALL

  7. Chapter 7 HiveQL: Views

    1. Views to Reduce Query Complexity

    2. Views that Restrict Data Based on Conditions

    3. Views and Map Type for Dynamic Tables

    4. View Odds and Ends

  8. Chapter 8 HiveQL: Indexes

    1. Creating an Index

    2. Rebuilding the Index

    3. Showing an Index

    4. Dropping an Index

    5. Implementing a Custom Index Handler

  9. Chapter 9 Schema Design

    1. Table-by-Day

    2. Over Partitioning

    3. Unique Keys and Normalization

    4. Making Multiple Passes over the Same Data

    5. The Case for Partitioning Every Table

    6. Bucketing Table Data Storage

    7. Adding Columns to a Table

    8. Using Columnar Tables

    9. (Almost) Always Use Compression!

  10. Chapter 10 Tuning

    1. Using EXPLAIN

    2. EXPLAIN EXTENDED

    3. Limit Tuning

    4. Optimized Joins

    5. Local Mode

    6. Parallel Execution

    7. Strict Mode

    8. Tuning the Number of Mappers and Reducers

    9. JVM Reuse

    10. Indexes

    11. Dynamic Partition Tuning

    12. Speculative Execution

    13. Single MapReduce MultiGROUP BY

    14. Virtual Columns

  11. Chapter 11 Other File Formats and Compression

    1. Determining Installed Codecs

    2. Choosing a Compression Codec

    3. Enabling Intermediate Compression

    4. Final Output Compression

    5. Sequence Files

    6. Compression in Action

    7. Archive Partition

    8. Compression: Wrapping Up

  12. Chapter 12 Developing

    1. Changing Log4J Properties

    2. Connecting a Java Debugger to Hive

    3. Building Hive from Source

    4. Setting Up Hive and Eclipse

    5. Hive in a Maven Project

    6. Unit Testing in Hive with hive_test

    7. The New Plugin Developer Kit

  13. Chapter 13 Functions

    1. Discovering and Describing Functions

    2. Calling Functions

    3. Standard Functions

    4. Aggregate Functions

    5. Table Generating Functions

    6. A UDF for Finding a Zodiac Sign from a Day

    7. UDF Versus GenericUDF

    8. Permanent Functions

    9. User-Defined Aggregate Functions

    10. User-Defined Table Generating Functions

    11. Accessing the Distributed Cache from a UDF

    12. Annotations for Use with Functions

    13. Macros

  14. Chapter 14 Streaming

    1. Identity Transformation

    2. Changing Types

    3. Projecting Transformation

    4. Manipulative Transformations

    5. Using the Distributed Cache

    6. Producing Multiple Rows from a Single Row

    7. Calculating Aggregates with Streaming

    8. CLUSTER BY, DISTRIBUTE BY, SORT BY

    9. GenericMR Tools for Streaming to Java

    10. Calculating Cogroups

  15. Chapter 15 Customizing Hive File and Record Formats

    1. File Versus Record Formats

    2. Demystifying CREATE TABLE Statements

    3. File Formats

    4. Record Formats: SerDes

    5. CSV and TSV SerDes

    6. ObjectInspector

    7. Think Big Hive Reflection ObjectInspector

    8. XML UDF

    9. XPath-Related Functions

    10. JSON SerDe

    11. Avro Hive SerDe

    12. Binary Output

  16. Chapter 16 Hive Thrift Service

    1. Starting the Thrift Server

    2. Setting Up Groovy to Connect to HiveService

    3. Connecting to HiveServer

    4. Getting Cluster Status

    5. Result Set Schema

    6. Fetching Results

    7. Retrieving Query Plan

    8. Metastore Methods

    9. Administrating HiveServer

    10. Hive ThriftMetastore

  17. Chapter 17 Storage Handlers and NoSQL

    1. Storage Handler Background

    2. HiveStorageHandler

    3. HBase

    4. Cassandra

    5. DynamoDB

  18. Chapter 18 Security

    1. Integration with Hadoop Security

    2. Authentication with Hive

    3. Authorization in Hive

  19. Chapter 19 Locking

    1. Locking Support in Hive with Zookeeper

    2. Explicit, Exclusive Locks

  20. Chapter 20 Hive Integration with Oozie

    1. Oozie Actions

    2. A Two-Query Workflow

    3. Oozie Web Console

    4. Variables in Workflows

    5. Capturing Output

    6. Capturing Output to Variables

  21. Chapter 21 Hive and Amazon Web Services (AWS)

    1. Why Elastic MapReduce?

    2. Instances

    3. Before You Start

    4. Managing Your EMR Hive Cluster

    5. Thrift Server on EMR Hive

    6. Instance Groups on EMR

    7. Configuring Your EMR Cluster

    8. Persistence and the Metastore on EMR

    9. HDFS and S3 on EMR Cluster

    10. Putting Resources, Configs, and Bootstrap Scripts on S3

    11. Logs on S3

    12. Spot Instances

    13. Security Groups

    14. EMR Versus EC2 and Apache Hive

    15. Wrapping Up

  22. Chapter 22 HCatalog

    1. Introduction

    2. MapReduce

    3. Command Line

    4. Security Model

    5. Architecture

  23. Chapter 23 Case Studies

    1. m6d.com (Media6Degrees)

    2. Outbrain

    3. NASA’s Jet Propulsion Laboratory

    4. Photobucket

    5. SimpleReach

    6. Experiences and Needs from the Customer Trenches

  1. Glossary

  2. Appendix References

  3. Colophon