Building Big Data Platforms

Video description

What kinds of platforms have Netflix, LinkedIn, CERN, and PayPal constructed to handle big data operations unique to their businesses? And how can you apply some of these solutions to your own business? The answers lie in this unique O’Reilly video collection, taken from live sessions at Strata + Hadoop World 2015 in San Jose, California.

This video collection includes:

Big Data at Netflix: Faster and Easier
Kurt Brown (Netflix)
Learn the technologies that drive the Netflix Data Platform (Hadoop 2, Pig on Tez, Presto on AWS), as well as the motivations behind their architecture and approach, and the benefits that they (and hopefully you) can achieve.

Building Interactive Data Applications at Scale
Fangjin Yang (Metamarkets), Vadim Ogievetsky (Independent)
Find out how to build data applications for visualizing, navigating, and interpreting reams of data, using the facet.js data query framework on the front end and the Druid open source data store on the back end.

Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3
Anil Madan (PayPal)
Get acquainted with PayPal’s behavioral analytics lineup: Storm and Hadoop in the Real Time Analytics pipeline, Druid as a real time distributed OLAP metrics store, the D3 visualization framework, and Apache Titan & Gremlin for visitor pathing and funnel analytics.

Building Real-time Data Products at LinkedIn with Apache Samza
Martin Kleppmann (Independent)
Sometimes you need to process data continuously and react to it within a few seconds. Learn how LinkedIn uses Apache Samza to solve real-time data problems, and understand how you can structure real-time data pipelines for scale and flexibility.

An Open Source Approach to Gathering and Analyzing Device Sourced Health Data
Ian Eslick (VitalLabs)
Discover how VitalLabs captures and integrates device-based and other health data for research, using the Switchboard application for routing data and the Trusted Analytic Container (TAC) for consolidating data for analytics.

Ticketmaster: Marketing and Selling the World's Tickets
John Carnahan (Ticketmaster)
Learn about the solutions that Ticketmaster uses for ticket sales and marketing, including Storm for stream processing, trend prediction, and anomaly detection; and Kafka, Storm, and Hbase for real-time "n-squared" marketing.

Unlocking Big Data at CERN
Matthias Braeger (CERN), Manish Devgan (Software AG Terracotta)
Unlock the architecture of CERN projects—including C2MON, CERN’s Control & Monitoring Platform—that leverage Hadoop and Terracotta In-Memory Data platform to gain real-time insights from sensor data.

Unboxing Data Startups
Michael Abbott (Kleiner Perkins Caufield & Byers)
Investor and entrepreneur Michael Abbott unboxes three startups to look at the technology, architecture, and innovations they’ve harnessed to deliver their products and services.

Publisher resources

View/Submit Errata

Product information

  • Title: Building Big Data Platforms
  • Author(s):
  • Release date: June 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491931035