Introduction to Apache Kafka
A Quick Primer for Developers and Administrators
Publisher: O'Reilly Media
Final Release Date: March 2015
Run time: 2 hours 55 minutes

Currently one of the hottest projects across the Hadoop ecosystem, Apache Kafka is a distributed, real-time data system that functions in a manner similar to a pub/sub messaging service, but with better throughput, built-in partitioning, replication, and fault tolerance. In this video course, host Gwen Shapira from Cloudera shows developers and administrators how to integrate Kafka into a data processing pipeline.

You’ll start with Kafka basics, walk through code examples of Kafka producers and consumers, and then learn how to integrate Kafka with Hadoop. By the end of this course, you’ll be ready to use this service for large-scale log collection and stream processing.

  • Learn Kafka’s use cases and the problems that it solves
  • Understand the basics, including logs, partitions, replicas, consumers, and producers
  • Set up a Kafka cluster, starting with a single node before adding more
  • Write producers and consumers, using old and new APIs
  • Use the Flume log aggregation framework to integrate Kafka with Hadoop
  • Configure Kafka for availability and consistency, and learn how to troubleshoot various issues
  • Become familiar with the entire Kafka ecosystem

Gwen Shapira is a software engineer at Cloudera with 15 years of experience working with customers to design scalable data architectures. Working as a data warehouse DBA, ETL developer, and a senior consultant, Gwen specializes in building scalable data processing pipelines and integrating existing data systems with Hadoop. She’s a committer to Apache Sqoop and an active contributor to Apache Kafka.

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyIntroduction to Apache Kafka
 
5.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (0)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Displaying reviews 1-2

Back to top

(2 of 2 customers found this review helpful)

 
5.0

An excellent intro and then some

By Gideon

from Israel

About Me Developer

Verified Buyer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Intermediate

    Comments about oreilly Introduction to Apache Kafka:

    I bought this together with "Intro to Big Data" for a discount, it was well worth it. I downloaded and watched it offline (mp4 files).

    It's great to learn about Kafka and its eco-system from Gwen, who is part of the team which develops Kafka and other big-data open-source projects.

    This course gives a thorough introduction to Kafka, how it works, how to set it up correctly, troubleshoot it and integrate it with other systems.

    My only gripe is that the video quality of the mp4 files is a bit lacking, and many slides and screencasts are not clear (compression, compression, compression...).
    Also the videos could use some editing.

    (3 of 6 customers found this review helpful)

     
    5.0

    Excellent getting upto speed videos

    By Ramumar

    from Singapore

    About Me Domain Architect

    Verified Reviewer

    Pros

    • Easy to understand
    • Helpful examples

    Cons

      Best Uses

        Comments about oreilly Introduction to Apache Kafka:

        Many of the medium to big enterprises have data integration problems due to myriad systems that have today. Kafka will be one of more important application infrastrucutre that most of them would like to have. Learning about Kafka is must for any enterpise architect in 2015 and beyond.

        Displaying reviews 1-2

        Back to top

         
        Buy 2 Get 1 Free Free Shipping Guarantee
        Buying Options
        Immediate Access - Go Digital what's this?
        Video:  $59.99
        (Streaming, Downloadable)