Books & Videos

Table of Contents

Chapter: Keynotes

Possibilities powered by the cloud - Tom Reilly (Cloudera), Charles Zedlewski (Cloudera)

12m 38s

Building the metadata highway (sponsored by IBM) - Mandy Chessell (IBM)

09m 24s

The science of visual interactions - Miriam Redi (Bell Labs Cambridge, UK)

13m 40s

Machine learning is a moonshot for us all (sponsored by Google) - Darren Strange (Google)

05m 42s

What Kaggle has learned from almost a million data scientists - Anthony Goldbloom (Kaggle)

15m 34s

Another one bytes the dust (sponsored by Dell EMC) - Paul Brook (Dell EMC)

05m 26s

The data subject first? - Aurélie Pols (Mind Your Group by Mind Your Privacy)

09m 24s

Real-time intelligence gives Uber the edge - M. C. Srivas (Uber)

13m 26s

Lessons from piloting the London Office of Data Analytics - Eddie Copeland (Nesta)

14m 1s

Accelerate analytics and AI innovations with Intel (sponsored by Intel) - Ziya Ma (Intel Corp)

11m 5s

Enabling data science in the enterprise - Mike Olson (Cloudera), Tom Smith (Office of National Statistics)

10m 36s

Is finance ready for AI? - Aida Mehonic (ASI Data Science)

14m 21s

Peeking into the black box: Lessons from the front lines of machine-learning product launches - Grace Huang (Pinterest)

12m 35s

Using AI to create new jobs - Tim O'Reilly (O'Reilly Media)

28m 41s

Chapter: FinData

Crossing the river by feeling the stones - Simon Wardley (Leading Edge Forum)

34m 13s

Chapter: Sponsored

Deep Learning: Assessing Analytics Project Feasibility and Its Computational Requirements - Adam Grzywaczewski (NVIDIA LTD)

37m 10s

Architecting the future: Insights learned from Google’s journey in data - Darren Strange (Google)

41m 42s

The added value of data science - Jan Willem Gehrels (IBM Corporation)

29m 27s

Build big data enterprise solutions faster on Azure HDInsight - Pranav Rastogi (Microsoft)

41m 5s

Architecture best practices for big data deployments - Cory Minton (EMC)

39m 51s

Migrating petabyte-scale Hadoop clusters with zero downtime - Alon Elishkov (Outbrain)

39m 31s

Replication as a service - Eric Lotter (WANdisco)

45m 24s

Ingest, process, analyze: Automation and integration through the big data journey - Neil Cullum (BMC Software), Alon Lebenthal (BMC Software)

34m 58s

The digital twin: Real and gaining ground - Shree Dandekar (Honeywell)

38m 41s

Empowering data analytics: Real-life use cases - Martin Oberhuber (Think Big, a Teradata company)

41m 45s

Chapter: Data Case Studies

Making the future happen sooner - Alistair Croll (Solve For Interesting)

31m 19s

The mystery of the vanishing pins: Building a sustainable content ecosystem at Pinterest - Grace Huang (Pinterest)

37m 37s

Applying machine and deep learning to unleash value in the automotive industry - Josef Viehhauser (BMW Group), Dominik Schniertshauer (BMW Group)

36m 51s

TensorFlow in the wild; Or, the democratization of machine intelligence - Kazunori Sato (Google)

39m 41s

Chapter: Data-driven business management

The five dysfunctions of a data engineering team - Jesse Anderson (Big Data Institute)

44m 16s

Principles of data science management - David Martinez Rego (DataSpartan)

40m 30s

Chapter: Data science and advanced analytics

AI within O'Reilly Media - Paco Nathan (O'Reilly Media)

46m 23s

Machine learning with partial and biased feedback - Damien Lefortier (Facebook)

37m 58s

Enterprise artificial intelligence - Laura Frolich (Think Big, A Teradata Company)

33m 18s

Reducing neural-network training time through hyperparameter optimization - Amitai Armon (Intel), Yahav Shadmi (Intel)

27m 2s

Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine)

40m 33s

TensorFlow and deep learning (without a PhD) - Martin Görner (Google)

40m 26s

Deep learning in practice - Mikio Braun (Zalando SE)

42m 58s

The state of TensorFlow and where it is going in 2017 - Sherry Moore (Google)

37m 36s

Tensor abuse in the workplace - Ted Dunning (MapR Technologies)

33m 50s

What does your postcode say about you? A technique to understand rare events based on demographics - Gary Willis (ASI)

33m 42s

Relevancer: Finding and labeling relevant information in tweet collections - Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University)

27m 49s

Deep learning with Microsoft Cognitive Toolkit - Barbara Fusinska (Microsoft)

41m 35s

Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba)

38m 45s

Conversation AI: From theory to the great promise - Yishay Carmiel (Spoken Communications)

40m 20s

Video anomaly detection with self-supervised deep nets - Arshak Navruzyan (Startup.ML)

32m 32s

When models go rogue: Hard-earned lessons about using machine learning in production - David Talby (Atigeo)

40m 53s

Efficient R programming - Colin Gillespie (Jumping Rivers | Newcastle University)

36m 23s

Making self-service data science a reality - Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)

40m 55s

What "50 Years of Data Science" leaves out - Sean Owen (Cloudera)

30m 31s

Faster deep learning solutions from training to inference - Nir Lotan (Intel), Barak Rozenwax (Intel)

35m 55s

Fighting bad guys with data science - Jonathon Morgan (New Knowledge)

44m 9s

Chapter: Visualization & user experience

Create interactive maps in seconds with R and Leaflet - Jeroen Janssens (Data Science Workshops)

42m 7s

Visualizing the health of the internet with Measurement Lab - Irene Ros (Bocoup)

36m 45s

Chapter: Spark & beyond

A behind-the-scenes look into Spark's API and engine evolutions - Reynold Xin (Databricks)

41m 29s

Debugging Apache Spark - Holden Karau (IBM)

44m 9s

Spark machine-learning pipelines: The good, the bad, and the ugly - Vincent Van Steenbergen (w00t data)

32m 18s

How to secure Apache Spark? - Neelesh Srinivas Salian (Stitch Fix)

28m 8s

Chapter: Hardcore Data Science

Learning the relationships between time series metrics at scale; or, Why you can never find a taxi in the rain - Ira Cohen (Anodot)

32m 7s

Inferring the effect of an event using CausalImpact - Kay Brodersen (Google)

29m 48s

Reliable prediction: Handling uncertainty - Robin Senge (inovex GmbH)

30m 43s

Chapter: Hadoop platform and applications

Apache Kylin use cases in China - Luke Han (Kyligence)

39m 54s

Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop - Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera)

33m 9s

Creating real-time, data-centric applications with Impala and Kudu - Marcel Kornacker (Cloudera)

40m 15s

Chapter: Data engineering and architecture

Building a modern data architecture for scale - Ben Sharma (Zaloni)

33m 43s

Automated data exploration: Building efficient analysis pipelines with dask - Victor Zabalza (ASI Data Science)

40m 33s

Creating a virtual data lake with Apache Arrow - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)

41m 38s

Presto: Distributed SQL done faster - Wojciech Biela (Teradata), Łukasz Osipiuk (Teradata)

41m 40s

Performance and security: A tale of two cities - Rekha Joshi (Intuit)

43m 5s

Chapter: Big data and the Cloud

How to optimally run Cloudera batch data engineering workflows in AWS - Andrei Savu (Cloudera), Philip Langdale (Cloudera)

41m 42s

Building containerized Spark on a solid foundation with Quobyte and Kubernetes - Daniel Bäurer (inovex GmbH), Sascha Askani (inovex GmbH)

39m 7s

Journey to AWS: Straddling two worlds - Calum Murray (Intuit)

38m 36s

Chapter: Stream processing and analytics

Speeding up Twitter Heron streaming by 5x - Sanjeev Kulkarni (Streamlio), Maosong Fu (Twitter)

39m 5s

Unified stateful big data processing in Apache Beam (incubating) - Aljoscha Krettek (data Artisans)

40m 15s

Elastic streams: Dynamic data redistribution in Apache Kafka - Ben Stopford (Confluent), Ismael Juma (Confluent)

41m 25s

Stream all the things! - Dean Wampler (Lightbend)

31m 24s

Stream analytics with SQL on Apache Flink - Fabian Hueske (data Artisans)

38m 5s

Chapter: Law, ethics, governance

Data citizenship: The next stage of data governance - Antonio Alvarez (Santander Group), Lidia Crespo (Santander UK)

42m 12s

GDPR, data privacy, anonymization, minimization. . .oh my! - Steve Touw (Immuta)

42m 35s

Chapter: Data 101

Making a change: Digital transformation and organizational culture - Ellen Friedman (Independent)

35m 31s

Cloudy with a chance of on-prem - Jim Scott (MapR Technologies, Inc.)

28m 54s

Chapter: Platform Security and Cybersecurity

Safeguarding electronic stock trading: Challenges and key lessons in network security - Graham Ahearne (Corvil), Fergal Toomey (Corvil)

43m 5s

Machine learning to "spot" cybersecurity incidents at scale - Eddie Garcia (Cloudera)

40m 58s

Speed up big data encryption in Apache Hadoop and Spark - Haifeng Chen (Intel)

29m 58s

Chapter: Enterprise adoption

Data science governance: What and how - Andy Petrella (Kensu)

39m 55s

Chapter: Sensors, IOT & Industrial Internet

Daddy, what color is that airplane overhead, and where is it going? - Hellmar Becker (Hortonworks), Jorn Eilander (ING)

38m 30s

Chapter: Emerging Technologies

Algorithmic regulation - Daniele Quercia (Bell Labs), Giovanni Quattrone (UCL)

39m 51s

Chapter: Tutorials

Fast and effective training for deep learning - David Barber (Department of Computer Science, UCL)

27m 9s

Challenges in commercializing deep learning - Eduard Vazquez (Cortexica Vision Systems)

25m 12s

Ensembles in deep learning with Toupee - Alan Mosca (Sendence | Birkbeck, University of London)

27m 11s

Deep learning in commodities markets - Aida Mehonic (ASI Data Science)

26m 15s

Gaining additional labels for data: An introduction to using semisupervised learning for real problems - Yingsong Zhang (ASI Data Science)

27m 28s

Machine-learning algorithms: What they do and when to use them - Darren Cook (QQ Trend Ltd.)

29m 15s

Building your first big data application on AWS - Ian Meyers (Amazon Web Services (AWS)), Pratim Das (Amazon Web Services (AWS)), Ian Robinson (Amazon Web Services (AWS))

44m 13s

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 1

14m 22s

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 2

01m 14s

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 3

35m 49s

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Stephane Rion (Big Data Partnership) - Part 4

52m 27s

Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine) - Part 1

30m 41s

Distributed deep learning on AWS using Apache MXNet - Anima Anandkumar (UC Irvine) - Part 2

29m 45s

Practical machine learning with Python - Charlotte Werger (ASI Data Science) - Part 1

57m 20s

Practical machine learning with Python - Charlotte Werger (ASI Data Science) - Part 2

46m 5s

Deploying and managing Hive, Spark, and Impala in the public cloud - David Tishgart (Cloudera), Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Jennifer Wu (Cloudera) - Part 1

31m 37s

Deploying and managing Hive, Spark, and Impala in the public cloud - David Tishgart (Cloudera), Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Jennifer Wu (Cloudera) - Part 2

57m 39s

Deep learning for object detection and neural network deployment - Alison Lowndes (NVIDIA) - Part 1

48m 15s

Deep learning for object detection and neural network deployment - Alison Lowndes (NVIDIA) - Part 2

27m 10s

Discover the business value of open data - Majken Sander (TimeXtender)

31m 20s

10 ways your data project is going to fail and how to prevent it - Martin Goodson (Evolution AI)

28m 32s

Growing a data-driven organization at easyJet - Alberto Rey (easyJet PLC)

33m 44s

Big data at Cox Automotive: Delivering actionable insights to transform the way the world buys, sells, and owns vehicles - Allison Nau (Cox Automotive UK)

33m 30s

Interactive data visualizations using Visdown - Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Red Hat) - Part 1

22m 37s

Interactive data visualizations using Visdown - Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Red Hat) - Part 2

28m 1s

Architecting and building enterprise-class Spark and Hadoop in cloud environments - John Mikula (Google Cloud)

28m 22s

Architecting a next-generation data platform - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Ted Malaska (Blizzard) - Part 1

25m 15s

Architecting a next-generation data platform - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Ted Malaska (Blizzard) - Part 2

30m 57s

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1

29m 57s

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2

20m 40s

Spark and R with sparklyr - Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions) - Part 1

04m 30s

Spark and R with sparklyr - Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions) - Part 2

21m 19s

Transport for London: Using data to keep London moving - Sriskandarajah Suhothayan (WSO2), Roland Major (Transport for London)

31m 12s

How Apache Spark and AWS Lambda empower researchers to identify disease-causing mutations and engineer healthier genomes - Denis C. Bauer (Commonwealth Scientific and Industrial Research Organisation)

31m 16s

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Mubashir Kazia (Cloudera), Syed Rafice (Cloudera) - Part 1

33m 17s

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Mubashir Kazia (Cloudera), Syed Rafice (Cloudera) - Part 2

33m 34s

Real-time data pipelines with Apache Kafka - Tim Berglund (Confluent) - Part 1

17m 24s

Real-time data pipelines with Apache Kafka - Tim Berglund (Confluent) - Part 2

02m 50s

Unraveling data with Spark using machine learning - Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera) - Part 1

31m 38s

Unraveling data with Spark using machine learning - Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera) - Part 2

18m 41s

Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) - Part 1

28m 37s

Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) - Part 2

30m 43s

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 1

31m 21s

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 2

38m 30s

Big data science, the IoT, and the transportation sector - Wael Elrifai (Pentaho)

31m 11s

Chapter: Multiple Topics

Driving business value: Predicting piston ring failures in massive vessels - Mads Ingwar (Think Big), Eliano Marques (Think Big)

34m 8s

Building deep learning-powered big data - Radhika Rangarajan (Intel)

33m 49s

Meta-data science: When all the world's data scientists are just not enough - Leah McGuire (Salesforce)

35m 37s

Executive Briefing: Advanced analytics in the cloud - Jerry Overton (DXC)

39m 34s

Executive Briefing: Cloud strategy - Manuel Sevilla (Capgemini)

40m 27s

Executive Briefing: Data governance and evolving privacy legislation: Daring to move beyond compliance - Aurélie Pols (Mind Your Group by Mind Your Privacy)

47m 18s

Mister P: Imputing granularity from your data - Rumman Chowdhury (Accenture)

27m 47s

Distributed deep learning at scale on Apache Spark with BigDL - Ding Ding (Intel)

24m 39s

A deep dive into Spark SQL's Catalyst optimizer - Herman van Hövell tot Westerflier (Databricks)

34m 15s

Organizing the data lake - Mark Madsen (Third Nature)

43m 22s

Executive Briefing: Dealing with device data - Mark Madsen (Third Nature)

43m 23s

Software industrialization meets big data at Goldman Sachs - Colin White (Goldman Sachs)

33m 23s

Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x performance improvement to Qunar’s streaming processing - Xueyan Li (Qunar), Yupeng Fu (Alluxio)

36m 3s

Realizing the promise of portability with Apache Beam - Tyler Akidau (Google)

41m 45s

Dask: Flexible analytic computing for Python - Matthew Rocklin (Continuum)

37m 41s

Is finance ready for AI? - Aida Mehonic (ASI Data Science)

27m 51s

Artificial intelligence in the enterprise - Martin Goodson (Evolution AI), Andrew Crisp (Dun & Bradstreet)

36m 23s

Driving the next wave of data lineage with automation, visualization, and interaction - Sean Kandel (Trifacta)

43m 52s

Computable content: Notebooks, containers, and data-centric organizational learning - Paco Nathan (O'Reilly Media)

43m 32s

Continuous analytics: Integrating the data hub in a DevOps pipeline - Arturo Bayo (Synergic Partners), Alvaro Fernandez Velando (Santander Spain)

37m 23s

Classifying restaurant pictures: An API with Spark and Slider - Mireia Alos Palop (Teradata), Natalino Busa (Teradata)

41m 20s

Executive Briefing: Analytics centers of excellence as a way to accelerate big data adoption by business - Carme Artigas (Synergic Partners)

33m 28s

Three years into creating value at ING Wholesale Banking with big data, advanced analytics, and artificial intelligence - Doron Reuter (ING)

28m 24s

The state of Spark in the cloud - Nicolas Poggi (Barcelona Supercomputing-Microsoft Research Center)

33m 1s

Surveillance and monitoring - Tanvi Singh (Credit Suisse)

29m 42s

Rethinking stream processing with Apache Kafka: Applications versus clusters and streams versus databases - Michael Noll (Confluent)

40m 22s

How knowledge graphs can help dramatically improve recommendations - Aurélien Géron (Kiwisoft)

42m 45s

Hadoop and object stores: Can we do it better? - Trent Gray-Donald (IBM), Gil Vernik (IBM)

41m 38s

Near-real-time ingest with Apache Flume and Apache Kafka at 1 million-events-per-second scale - Tristan Stevens (Cloudera)

40m 53s

How do you help charities do data? - Duncan Ross (TES Global), Emma Prest (DataKind)

43m 52s

Hadoop as a service: How to build and operate an enterprise data lake supporting operational and streaming analytics - Phillip Radley (BT)

46m 29s

Building a scalable recommendation engine with Spark and Elasticsearch - Seth Hendrickson (Cloudera)

40m 14s

Real-time machine learning with Redis, Apache Spark, TensorFlow, and more - Kamran Yousaf (Redis Labs)

33m 5s

EU GDPR as an opportunity to address both big data security and compliance - Eric Tilenius (BlueTalon)

35m 37s

Identifying and exploiting the keys to digital transformation - Jack Norris (MapR Technologies)

40m 25s

Speeding up machine-learning applications with the LightGBM library in real-time domains - Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft)

24m 47s

Making recommendations using graphs and Spark - Harry Powell (Barclays), Raffael Strassnig (Barclays)

45m 21s

Conversation interfaces for data science models - Galiya Warrier (Microsoft)

37m 16s

How to prevent future accidents in autonomous driving - Dr.-Ing. Michael Nolting (Volkswagen Commercial Vehicles)

48m 10s

Lessons learned working with Spark and Cassandra - Matthias Niehoff (codecentric AG)

33m 34s

Big data governance for the hybrid cloud: Best practices and how-to - Mark Donsky (Cloudera), Vikas Singh (Cloudera)

44m 48s

The future of natural language generation, 2016–2026 - Adam Smith (Automated Insights)

40m 32s

Fast data at ING: Utilizing Kafka, Spark, Flink, and Cassandra for data science and streaming analytics - Bas Geerdink (ING)

38m 27s

What no one tells you about writing a streaming app? - Mark Grover (Cloudera), Ted Malaska (Blizzard)

40m 54s

Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (Cloudera)

39m 43s

Data wrangling for insurance - Olivier de Garrigues (Trifacta)

23m 46s

From data dinosaurs to data stars in five weeks: Lessons from completing 80 data science projects - Kim Nilsson (Pivigo)

39m 9s

Deploy Spark ML TensorFlow AI models from notebooks to hybrid clouds (including GPUs) - Chris Fregly (PipelineAI)

48m 5s

Mastering computer vision problems with state-of-the art deep learning architectures, MXNet, and GPU virtual machines - Miguel Gonzalez-Fierro (Microsoft)

38m 5s

Big data computations: Comparing Apache HAWQ, Druid, and GPU databases - Dr. Vijay Srinivas Agneeswaran (SapientNitro)

45m 9s

The business case for deep learning, Spark, and friends - Sanjay Mathur (Silicon Valley Data Science)

29m 2s

The IoT is driving the need for more secure big data analytics - Brendan Rizzo (HPE)

35m 45s

What's your data worth? - John Akred (Silicon Valley Data Science)

43m 16s

"Smartifying" the game - Iñaki Puigdollers (Social Point)

29m 5s

A wealth of information leads to a poverty of attention: Why adopting the cloud can help you stay focused on the right things - Yuval Dvir (Google)

39m 50s