Hardcore Data Science California 2015

Video description

Push the envelope of data science by exploring emerging topics such as data management, machine learning, natural language processing, crowdsourcing, and algorithm design with this O’Reilly video collection—taken from the Hardcore Data Science sessions at Strata + Hadoop World 2015 in San Jose, California.

This video collection includes:

Beyond DNNs towards New Architectures for Deep Learning, with Applications to Large Vocabulary Continuous Speech Recognition
Tara Sainath, Researcher, Google
DNNs were first explored for acoustic modeling, where numerous research labs demonstrated improvements in WER between 10-40% relative. This session provides an overview of the latest improvements in deep learning across various research labs since the initial inception.

On the Computational and Statistical Interface and "Big Data"
Michael Jordan, Professor, UC Berkeley
How does statistical decision theory provide a mathematical point of departure for achieving such a blending? In this session, you’ll learn theoretical tradeoffs between statistical risk, amount of data, and “externalities” such as computation, communication, and privacy.

Interpretable Machine Learning in Practice
Maya Gupta, Research and Development Manager, Google
What makes a large machine learning system more interpretable and robust in practice? This session discusses the importance of monotonicity, smoothness, semantically meaningful inputs and outputs, and designing algorithms that are easy to debug.

Visual Understanding Beyond Naming
Alyosha Efros, Associate Professor, UC Berkeley
This session describes some of the efforts to bypass the “language bottleneck” and other information to help in visual understanding and visual data mining.

Finding Repeated Structure in Time Series Data: Commercial and Scientific Opportunities
Eamonn Keogh, Professor, University of California - Riverside
In this session, Eamonn argues that, relative to other types of data (text, social networks, etc.), time series data is relatively underexploited, and that many opportunities are available for novel commercial applications and scientific discoveries.

Tensor Methods for Large-scale Unsupervised Learning: Applications to Topic and Community Modeling
Anima Anandkumar, Faculty member, UC Irvine
Understand how to exploit tensor methods for learning. Tensors are higher order generalizations of matrices, and are useful for representing rich information structures. Tensor factorization involves finding a compact representation of the tensor using simple linear and multilinear algebra.

High Performance Machine Learning through Codesign and Rooflining
John Canny, Professor, UC Berkeley
How fast can machine learning (ML) and graph algorithms be? BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve one-to-three orders of magnitude improvements over other toolkits on single machines—larger than have been reported for cluster systems running on hundreds of nodes for common ML tasks.

A Quest for Visual Intelligence in Computers
Fei-Fei Li (Stanford University)
Look into computer vision technology, including ongoing projects in large-scale object recognition and visual scene story telling from Stanford Vision Lab.

Graph mining for log data
David Andrzejewski, Lead Data Sciences Engineer, Sumo Logic
Many of the millions of events logged inside a given software system are not isolated occurrences, but rather links in richly interconnected causal chains. This session reveals how graph-mining techniques can surface high-value insights from the relationships between logged events.

Why Julia is Important for Data Science
John Myles White, Data Scientist, Facebook
In this session, you’ll learn the ways in which Julia improves upon the current generation of languages used for data science.

Drugs, DNA, and Dinosaurs: Building High Quality Knowledge Bases with DeepDive
Chris Re, Assistant Professor, Stanford University
Learn how DeepDive is used in a range of tasks from diagnosing rare diseases to drug purposing to filling out the tree of life. DeepDive helps to create knowledge bases that meet—and sometimes exceed—human-level quality and to perform predictive analytics on top of this data.

Publisher resources

View/Submit Errata

Product information

  • Title: Hardcore Data Science California 2015
  • Author(s):
  • Release date: June 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491931080