Strata + Hadoop World 2017 - San Jose, California

Video description

Strata + Hadoop World San Jose 2017 gathered 325 of the globe's leading minds in technology and business to demonstrate how big data, machine learning, and analytics are changing not only business, but society itself. This video compilation provides a complete recording of each of the conference's 167technical sessions, 23 long-form tutorials, and 17 keynotes. Some of the featured speakers you'll hear from include: Confluent CEO Jay Kreps on stream processing and its impact on how businesses deal with real-time data; Amazon Ad platform leader Alice Zheng on the best feature engineering methods for machine learning pipelines; Eric Colson of Stitch Fix on how to build a great data science team; DataVisor CEO Yinglian Xie on how Spark's in-memory big data security analytics can identify nefarious sleeper cells; and Pinterest Chief Scientist Jure Leskovec on Pixie, the graph-based system that makes personalized recommendations to 100+ million users in real time.

Get this compilation and you'll enjoy unfettered access to the Strata Business Summit and a set of 29 carefully curated sessions specifically tailored for the C-level business executive. Taught by top data strategists and thinkers at Silicon Valley Data Science, MapR Technologies, LinkedIn, Unisys, UC Berkeley, Deloitte Touche Consulting, and from VCs at Kleiner, Perkins, Caufield & Byers, the Summit is like an MBA in data-driven business. You'll receive a hand-picked lineup of executive briefings on key issues, such as predictive analytics and machine learning, Cloud strategy, governance security and privacy, IoT, and artificial intelligence.

The 23 tutorials included in the compilation cover big data topics such as a review of Apache Spark 2.0 core concepts; an exploration of stream processing from the basics through Apache Beam; a practical look at how to do scalable, end-to-end data science in R on single machines and on Spark clusters; overviews of how to get started in Tensor Flow, architect a data platform, Scala and Spark, build data applications in AWS, build a data pipeline with Kafka, secure your Hadoop clusters; and how to visualize large, complex datasets with R, Hadoop, and Spark. Each of the conference's 17 keynote sessions are included, as well as all of the 167 specialized sessions, covering topics such as PyTorch, a flexible and intuitive framework for deep learning; Docker on Yarn; Spark structured streaming; the Netflix data platform; RubiX, a caching framework for big data engines in the cloud; Stanford University's Weld, an optimizing runtime for high-performance data analytics; and much, much more.

  • Learn from 325 of the world's top thinker-doers in big data, machine learning, and analytics
  • Enjoy a center row view at each of the conference's 167 sessions, 23 tutorials, and 17 keynotes
  • Gain total access to the Strata Business Summit – 29 sessions tailored for the business strategist
  • Hear how top companies like Comcast, American Express, and ING built their data strategies
  • See Data 101 – a comprehensive tutorial covering the core principles of data architecture
  • Watch the mindbender between Pokemon Go creater Phil Keslin and neuroscientist Beau Cronic
  • Learn from Cloudera's Hadoop experts on data governance, Spark, Kudu, and the Cloud
  • See how IBM implements deep learning to predict breast cancer proliferation scores
  • Get intensives on Hadoop cluster security, D3 visualization, and using R for scalable data analytics
  • Hear Google explain machine learning, TensorFlow, and Apache Beam stream processing
  • Enjoy a comprehensive recording with 200+ hours of material to explore at your own pace

Publisher resources

View/Submit Errata

Table of contents

  1. Keynotes
    1. The machine-learning renaissance Mike Olson (Cloudera)
    2. Applying data and machine learning to scale education Daphne Koller (Calico Labs | Coursera)
    3. Turning the internet upside down: Driving big data right to the edge (sponsored by MapR) Ted Dunning (MapR Technologies)
    4. Launching Pokémon GO Phil Keslin (Niantic, Inc.), Beau Cronin (Embedding.js)
    5. Machines and the magic of fast learning (sponsored by MemSQL) Eric Frenkiel (MemSQL)
    6. Becoming smarter about credible news Tom Reilly (Cloudera), Khalid Al-Kofahi (Thomson Reuters)
    7. Making good robots Andra Keay (Silicon Valley Robotics)
    8. Big data, AI, the genome, and everything (sponsored by Microsoft) Vijay Narayanan (Microsoft)
    9. Ray: A Distributed Execution Framework for Emerging AI Applications - Michael Jordan (UC Berkeley)
    10. Driving enterprise open source adoption, from data lake to AI (sponsored by Teradata) Ron Bodkin (Think Big Analytics)
    11. Data in disasters: Saving lives and innovating in real time Desiree Matel-Anderson (The Field Innovation Team)
    12. Machine learning is about your data and deployment, not just model development (sponsored by IBM) Dinesh Nirmal (IBM)
    13. Machine learning at Google (sponsored by Google) Rob Craft (Google)
  2. Data 101
    1. The business case for deep learning, Spark, and friends - Edd Wilder-James (Silicon Valley Data Science)
    2. Why stream? The advantages of working with streaming data - Ellen Friedman (Independent)
    3. Cloudy with a chance of on-prem - Jim Scott (MapR Technologies, Inc.)
    4. Stats: What you need to know - Gabriela de Queiroz (R-Ladies)
    5. What is AI? - Melanie Warrick (Skymind)
    6. Visualization without guesswork - Aneesh Karve (Quilt Data, Inc)
  3. Big data the Cloud
    1. Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 1
    2. Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 2
    3. Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 3
    4. Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 4
    5. Moving big data as a service to a multicloud world - Sriram Ganesan (Qubole), Prakhar Jain (Qubole)
    6. BI and SQL analytics with Hadoop in the cloud - Henry Robinson (Cloudera), Alex Gutow (Cloudera)
    7. Running a Cloudera cluster in production on Azure - Paige Liu (Microsoft), John Zhuge (Cloudera)
    8. RubiX: A caching framework for big data engines in the cloud - Shubham Tagra (Qubole)
    9. The enterprise geospatial platform: A perfect fusion of cloud and open source technologies - Naghman Waheed (Monsanto), Martin Mendez-Costabel (Monsanto)
    10. Practical considerations for running Spark workloads in the cloud - Anand Iyer (Cloudera), Eugene Fratkin (Cloudera)
    11. Alluxio (formerly Tachyon): The journey thus far and the road ahead - Haoyuan Li (Alluxio), Calvin Jia (Alluxio)
  4. Data science advanced analytics
    1. Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 1
    2. Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 2
    3. Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 3
    4. Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 4
    5. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 1
    6. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 2
    7. Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 3
    8. Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 1
    9. Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 2
    10. Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 3
    11. Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 4
    12. Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 1
    13. Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 2
    14. Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 3
    15. Uber's data science workbench - Peng Du (Uber Inc.) and Randy Wei (Uber Inc.)
    16. How Microsoft predicts churn of cloud customers using deep learning and explains those predictions in an interpretable way - Feng Zhu (Microsoft), Valentine Fontama (Microsoft)
    17. Intelligent pattern profiling on semistructured data with machine learning - Sean Kandel (Trifacta), Karthik Sethuraman (Trifacta)
    18. Squeezing deep learning onto mobile phones - Anirudh Koul (Microsoft)
    19. Recommending 1+ billion items to 100+ million users in real time: Harnessing the structure of the user-to-object graph to extract ranking signals at scale - Jure Leskovec (Pinterest)
    20. Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies - David Talby (Atigeo), Claudiu Branzan (G2 Web Services)
    21. Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML - Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM Spark Technology Center)
    22. PyTorch: A flexible and intuitive framework for deep learning - James Bradbury (Salesforce Research)
    23. The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking - Robert Grossman (University of Chicago)
    24. Tensor abuse in the workplace - Ted Dunning (MapR Technologies)
    25. The frontiers of attention and memory in neural networks - Stephen Merity (Salesforce Research)
    26. Automatic speaker segmentation: Using machine learning to identify who is speaking when - Matar Haller (Winton Capital)
    27. Feature engineering for diverse data types - Alice Zheng (Amazon)
    28. When is data science a house of cards? Replicating data science conclusions - June Andrews (Pinterest), Frances Haugen (Pinterest)
    29. Distributed deep learning on AWS using MXNet - Anima Anandkumar (UC Irvine)
    30. The state of TensorFlow today and where it is headed in 2017 - Rajat Monga (Google)
    31. Clustering user sessions with NLP methods in complex internet applications - Dorna Bandari (Pinterest Inc.)
    32. Weld: An optimizing runtime for high-performance data analytics - Shoumik Palkar (Stanford University)
    33. Learning from incomplete, imperfect data with probabilistic programming - Michael Lee Williams (Fast Forward Labs)
    34. The power of persuasion modeling - Michelangelo D'Agostino (Civis Analytics), Bill Lattner (Civis Analytics)
    35. Making self-service data science a reality - Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)
    36. The app trap: Why every mobile app needs anomaly detection - Ira Cohen (Anodot)
    37. Predicting customer lifetime value for a subscription-based business - Chao Zhong (Microsoft)
    38. Building a recommender from a big behavior graph over Cassandra - Gleicon Moraes (luc.id), Arthur Grava (Luizalabs)
    39. Seven steps to high-velocity data analytics with DataOps - Christopher Bergh (DataKitchen), Gil Benghiat (DataKitchen)
    40. Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba)
    41. Compressed linear algebra in Apache SystemML - Frederick Reiss (IBM Spark Technology Center), Arvind Surve (IBM)
    42. Leveraging open source automated data science tools - Eduardo Arino de la Rubia (Domino Data Lab)
  5. Law, ethics, governance
    1. Executive Briefing: Doing data right—Legal best practices for making your data work - Alysa Z. Hutnik (Kelley Drye Warren LLP), Crystal Skelton (Kelley Drye Warren LLP)
    2. Big data governance for the hybrid cloud: Best practices and how-to - Mark Donsky (Cloudera), Sudhanshu Arora (Cloudera)
    3. Data at risk: Backing up the world's research data - Max Ogden (Independent)
  6. Spark beyond
    1. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 1
    2. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 2
    3. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 3
    4. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 4
    5. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 5
    6. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 1
    7. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 2
    8. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 3
    9. Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 4
    10. Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 1
    11. Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 2
    12. Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 3
    13. Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 4
    14. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1
    15. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2
    16. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3
    17. Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4
    18. Zillow: Transforming real estate through big data and machine learning - Jasjeet Thind (Zillow)
    19. Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (IBM)
    20. Sparklyr: An R interface for Apache Spark - Edgar Ruiz (RStudio)
    21. Spark at scale in Bing: Use cases and lessons learned - Kaarthik Sivashanmugam (Microsoft)
    22. Hoodie: Incremental processing on Hadoop at Uber - Vinoth Chandar (Uber), Prasanna Rajaperumal (Uber)
    23. How Spark can fail or be confusing and what you can do about it - Yin Huai (Databricks)
    24. Debugging Apache Spark - Holden Karau (IBM), Joey Echeverria (Rocana)
    25. Effective Spark with Alluxio - Calvin Jia (Alluxio)
  7. Visualization user experience
    1. Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 1
    2. Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 2
    3. Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 3
    4. Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 4
    5. Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 1
    6. Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 2
    7. Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 3
    8. Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 4
    9. Data Science and Design Or, on the unpredictability of the iterative design process - Rumman Chowdhury (Accenture)
    10. Beyond polarization: Data UX for a diversity of workers - Joe Hellerstein (UC Berkeley), Giorgio Caviglia (Trifacta), Alon Bartur (Trifacta)
    11. Bringing data into design: How to craft personalized user experiences - Ricky Hennessy (frog), Charlie Burgoyne (frog)
    12. Why the next wave of data lineage is driven by automation, visualization, and interaction - Sean Kandel (Trifacta)
    13. Building interactive data products for risk measurement and monitoring - Warren Reed (US Treasury’s Office of Financial Research)
  8. Platform security cybersecurity
    1. A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 1
    2. A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 2
    3. A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 3
    4. A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 4
    5. Paint the landscape and secure your data center with Apache Spot - Cesar Berho (Intel), Alan Ross (Intel)
    6. Cloudy with a chance of fraud: A look at cloud-hosted attack trends - Ting-Fang Yen (DataVisor)
    7. Pluggable security in Hadoop - Yuliya Feldman (Dremio Corporation)
    8. Don’t sleep on sleeper cells: Using big data to drive detection - Yinglian Xie (DataVisor)
    9. Malicious site detection with large-scale belief propagation - Alexander Ulanov (Hewlett Packard Labs), Manish Marwah (Hewlett Packard Labs)
  9. Data engineering and architecture
    1. Big data for operational insights - Felix Gorodishter (GoDaddy)
    2. Shifting left for continuous quality in an Agile data world - Avinash Padmanabhan (Intuit)
    3. Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka - Ryan Pridgeon (Confluent), Dustin Cote (Confluent)
    4. Achieving real-time ingestion and analysis of security events through Kafka and Metron - Kevin Mao (Capital One)
    5. The Netflix data platform: Now and in the future - Kurt Brown (Netflix)
    6. Making architecture choices for small and big data problems - Nischal HP (Unnati Data Labs), Raghotham Sripadraj (Unnati Data Labs)
    7. Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at LinkedIn - Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)
    8. The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio), Jacques Nadeau (Dremio)
    9. DevOps for models: How to manage millions of models in production - Teresa Tung (Accenture Labs), Jurgen Weichenberger (Accenture Analytics), Ishmeet Grewal (Accenture Technology Labs)
    10. One cluster does not fit all: Architecture patterns for multicluster Apache Kafka deployments - Gwen Shapira (Confluent)
    11. Deep learning for IT operations intelligence using open source tools - Shivnath Babu (Duke University | Unravel Data Systems)
  10. Sponsored Sessions
    1. Real-time analytics at Uber scale (sponsored by MemSQL) - James Burkhart (Uber)
    2. Ingredients to a successful data analytics project (sponsored by Dell EMC) - Erin Banks (Dell EMC)
    3. Advanced data federation and cost-based optimization using Apache Calcite and Spark SQL (sponsored by DataScience) - Jason Slepicka (DataScience)
    4. Big data analytics accelerating innovation in sports (sponsored by Intel) - Sasi Kuppannagari (Intel Corporation)
    5. Fixing what’s broken: Big data in the enterprise (sponsored by Cask) - Jonathan Gray (Cask)
    6. Machine learning and microservices: A framework for next-gen applications (sponsored by MapR Technologies) - Nitin Bandugula (MapR Technologies)
    7. Building a modern data architecture (sponsored by Zaloni) - Ben Sharma (Zaloni)
    8. Building an automation-driven Lambda architecture (sponsored by BMC) - Darren Chinen (Malwarebytes), Sujay Kulkarni (Malwarebytes), Manjunath Vasishta (Malwarebytes)
    9. Get data lakes, data catalogs, and real-time streams in less time with fewer people and more machine learning (sponsored by Informatica) - Murthy Mathiprakasam (Informatica)
    10. Continuous queries over high-velocity event streams using an in-memory database (sponsored by VoltDB) - Ethan Zhang (VoltDB)
    11. Five steps to a killer data lake, from ingest to machine learning (sponsored by Pentaho) - Mark Burnette (Pentaho, a Hitachi Group Company)
    12. When big data leads to big results (sponsored by Paxata) - Chandhu Yalla (Intel), Nenshad Bardoliwalla (Paxata)
    13. Outsmarting insider threats: Safeguarding your most sensitive assets (sponsored by SAS) - Charlotte Crain (SAS), Tyler Freckman (SAS)
    14. Exploiting Hadoop with artificial intelligence and machine learning (sponsored by DataRobot) - Greg Michaelson (DataRobot)
    15. How Peak Games is building analytics infrastructure to improve user experience (sponsored by Snowflake) - Serdar Sahin (Peak Games)
    16. Building data lakes in the cloud with self-service access (sponsored by Talend) - Eric Anderson (Beachbody), Shyam Konda (Beachbody)
    17. Virtualizing Hadoop and Spark: Architecture, performance, and best practices (sponsored by VMware) - Justin Murray (VMware)
    18. Fregata: TalkingData's lightweight, large-scale machine-learning library on Spark (sponsored by TalkingData) - Xiatian Zhang (TalkingData Ltd.)
    19. Presto: Distributed SQL on anything (sponsored by Teradata) - Kamil Bajda-Pawlikowski (Teradata)
    20. Using big data, the cloud, and AI to enable intelligence at scale (sponsored by Microsoft) - Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
    21. Modern big data service architecture: Evolving from cloud-native and serverless to intelligent data clouds (sponsored by Futurewei Technologies) - Luhui Hu (Futurewei Technologies)
    22. Machine learning with Google Cloud Platform (sponsored by Google) - Rob Craft (Google)
    23. Replication as a service (sponsored by WANDisco) - Jagane Sundar (WANdisco)
  11. Ask Me Anything
    1. Ask me anything: Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science), Julie Steele (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
    2. Ask me anything: Gwen Shapira - Gwen Shapira (Confluent)
    3. Ask me anything: Apache Beam - Tyler Akidau (Google), Frances Perry (Google), Kenneth Knowles (Google), Slava Chernyak (Google)
    4. Ask me anything: Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
  12. Data Case Studies
    1. Wrangling the vote: Fueling campaign strategies analyzing diverse voter data - Jim Harrold (NationBuilder)
    2. Building a real-time data science service for mobile advertising - Robin Li (Tapjoy), Yohan Chin (Tapjoy)
    3. Data integration and governance for big data with Apache Avro; or, How to solve the GIGO problem - Barbara Eckman (Comcast)
    4. Data monetization: A telecommunications use case - Dirk Jungnickel (Emirates Integrated Telecommunications Company (du))
    5. How a global manufacturing company built a data science capability from scratch - Carlo Torniai (Pirelli Tyre)
    6. Building the “future you” retirement planning service on a Hadoop data lake - Chris Murphy (Zurich Insurance Group) and Martin Lidl (Deloitte)
    7. Data science and critical thinking - Alistair Croll (Solve For Interesting)
    8. New user recommendations at scale: Identifying compelling content for low-signal users using a hybrid-curation approach - Maura Lynch (Pinterest)
    9. Real-time analysis of behavior of law enforcement encounters using big data analytics and deep learning multimodal emotion-recognition models - Nixon Patel (Kovid Group)
    10. Building a streaming analytics solution to provide real-time actionable insights to customers - Bas Geerdink (ING)
    11. Emotion text analytics for deeper understanding and better prediction of irrational human behavior - Lana Novikova (Heartbeat AI Technologies)
  13. Hadoop platform applications
    1. Apache Kudu: 1.0 and beyond - Todd Lipcon (Cloudera)
    2. Docker on YARN - Daniel Templeton (Cloudera)
    3. How to leverage your private cloud infrastructure to deploy Hadoop - Dwai Lahiri (Cloudera)
    4. Creating real-time, data-centric applications with Impala and Kudu - Todd Lipcon (Cloudera), Marcel Kornacker (Cloudera)
    5. Apache Kylin 2.0: From classic OLAP to real-time data warehouse - Yang Li (Kyligence)
  14. Data, transportation, and logistics
    1. Big data opportunities in next-generation mobility - Evangelos Simoudis (Synapse Partners)
    2. Transforming cities with Mapbox and open data - Ryan Baumann (Mapbox)
    3. The IoT and the autonomous vehicle in the clouds: Simultaneous localization and mapping (SLAM) with Kafka and Spark Streaming - Jay White Bear (IBM)
    4. Transport for London: Using data to keep London moving - Roland Major (Transport for London)
    5. Greasing the Wheels of International Logistics - Rajiv Paul (Yakit)
    6. Machine-learning opportunities within the airline industry - Rodrigo Fontecilla (Unisys)
    7. How Vnomics built and deployed a “digital twin” in commercial trucking that led to $160M (and counting) in verified operational fuel savings - Lloyd Palum (Vnomics)
    8. How Lufthansa German Airlines is using data analytics to create the next level of customer experience - Andreas Ribbrock (#zeroG, A Lufthansa Systems Company)
  15. Stream processing analytics
    1. Learn stream processing with Apache Beam - Frances Perry (Google), Tyler Akidau (Google), Ken Knowles (Google) - Part 1
    2. Learn stream processing with Apache Beam - Frances Perry (Google), Tyler Akidau (Google), Ken Knowles (Google) - Part 2
    3. Building real-time data pipelines with Apache Kafka - Ian Wrigley (Confluent) - Part 1
    4. Building real-time data pipelines with Apache Kafka - Ian Wrigley (Confluent) - Part 2
    5. Building real-time data pipelines with Apache Kafka - Ian Wrigley (Confluent) - Part 3
    6. Building real-time data pipelines with Apache Kafka - Ian Wrigley (Confluent) - Part 4
    7. Unified, portable, efficient: Batch and stream processing with Apache Beam (incubating) - Kenneth Knowles (Google)
    8. The rise of real time: Apache Kafka and the streaming revolution - Jay Kreps (Confluent)
    9. Amazon Kinesis data streaming services - Roger Barga (Amazon Web Services)
    10. The evolution of massive-scale data processing - Tyler Akidau (Google)
    11. From rivulets to rivers: Elastic stream processing in Heron - Bill Graham (Twitter), Avrilia Floratau (Microsoft), Ashvin Agrawal (Microsoft)
    12. Watermarks: Time and progress in Apache Beam (incubating) and beyond - Slava Chernyak (Google)
    13. Developing streaming applications with Apache Apex - David Yan (DataTorrent, Inc.)
  16. Sensors, IOT Industrial Internet
    1. Robot farmers and chefs: In the field and in your kitchen - Tim Gasper (Bitfusion)
    2. Individualized care driven by wearable data and real-time analytics - Julie Lockner (17 Minds Corporation)
  17. Real-time applications
    1. Processing millions of events per second without breaking the bank - Kartik Paramasivam (LinkedIn)
    2. Real-time analytics using Kudu at petabyte scale - Sridhar Alla (Comcast), Shekhar Agrawal (Comcast)
    3. Building reliable real-time services with Apache DistributedLog - Sijie Guo (Twitter)
    4. Designing a time series database to support IoT workloads - Michael Freedman (Timescale | Princeton University)
    5. The common anomaly detection platform at Microsoft - Tony Xing (Microsoft)
    6. Streams: Successfully transforming your business one millisecond at a time - Manny Puentes (Rebel AI)
    7. Graph-based anomaly detection: When and how - Jeffrey Yau (Silicon Valley Data Science)
  18. Data-driven business management
    1. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science) - Part 1
    2. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science) - Part 2
    3. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science) - Part 3
    4. Developing a modern enterprise data strategy - Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science) - Part 4
    5. Determining the economic value of your data - William Schmarzo (Dell EMC) - Part 1
    6. Determining the economic value of your data - William Schmarzo (Dell EMC) - Part 2
    7. Determining the economic value of your data - William Schmarzo (Dell EMC) - Part 3
    8. Determining the economic value of your data - William Schmarzo (Dell EMC) - Part 4
    9. The main event: Identifying and exploiting the keys to digital transformation - Jack Norris (MapR Technologies)
    10. How we work: The unspoken challenges of doing data science - Yael Garten (LinkedIn)
    11. Data-driven innovation - Gillian Docherty (The Data Lab)
    12. Organizing for data science: Some unintuitive lessons learned for unlocking value - Eric Colson (Stitch Fix)
    13. The programmable enterprise: Software is central to innovation - Robert Cohen (Economic Strategy Institute)
  19. The Solutions Showcase Theater
    1. Cloudera and SAS: Leaders Coming Together - Clark Bradley, Principal Technical Architect (Cloudera)
    2. Streaming and Microservices for Fast Data - Dale Kim,Sr. Director, Industry Solutions (MapR)
    3. Scoring Machine Learning Models at Scale - John Bowler, Software Engineer (MemSQL)
    4. SAP Vehicles Network helps Hertz and Mojio improve customer experience - Steven Kim, Sr. Director, Connected Vehicles (SAP)
    5. Thinking Data Lakes – From build to operate - Prakul Sharma, Senior Manager (Deloitte)
    6. How Industry Leader Speeds Blends Data for Daily Insights - Nicolas Morales, Sr. Director, Technical Sales Solutions (Clearstory Data)
    7. Water Mission’s solar-powered Living Water™ Treatment Systems bring clean, safe water to thousands of communities - Andrea Braida, Portfolio Marketing Manager (IBM)
    8. Building and Shipping Models That Really Work! - Ali Marami, Chief Data Scientist (R-Brain)
    9. Achieving Efficient Analytics and Management of Indexes - Munir Bondre, CTO (Fuzzy Logix)
    10. How Spotify moved from one of Europe's largest on-prem Hadoop clusters to Google Cloud - William Vambenepe, Senior Product Manager (Google)
    11. Moving complex retail analytics onto Hadoop - Sharon Kirkham, VP Analytics and Consultancy (Kognitio)
    12. Hyper-Acceleration of Big Data Workloads with FPGAs - Roop Ganguly, Solution Architect (Bigstream Solutions)
    13. The Self-Service Platform for Data Engineering Data Science - Lovan Chetty, Director of Product Management (Cazena)
    14. How to Automate Data Operations so You Can Build Machine Learning and Advanced Analytics - Saket Saurabh, Co-founder and CEO (Nexla)
    15. Building the modern data platform - David Hsieh, CMO (Qubole)
    16. Right-Sizing Your Big Data Infrastructure - Tom Lyon, Founder and Chief Scientist (Drivescale)
    17. Deep Learning Solutions with BIG DL - Radhika Rangarajan, Senior Technical Program Manager, Big Data (Intel)
    18. How the DataScience Cloud Helped Topix Optimize Advertising Decisions - William Merchan, CSO (DataScience)
    19. How to easily maximize performance and minimize cost on the cloud - Kunal Agarwal, CEOCTO (Unravel)
    20. Power BI in action - Sanjay Soni, Sr.Technical Product Marketing Manager (Microsoft)
    21. In-Memory Computing for Real-Time Big Data - Nikita Ivanov, CTO (Gridgain)
    22. Machine Learning Automation and Social Analytics - Siva Gopal and Devi Kondapi (MSR Cosmos)
    23. User-enabled Data Lakes with Open Source Kylo - Scott Reisdorf, Principal R Software Engineer (Think Big Analytics)
    24. User experience in data analytics platform - a design thinking approach - Naresh Agarwal, AVP, Brillio Data Practice (Brillio)
    25. Super Simple Hybrid Data Access - Sumit Sarkar, Sr. Manager, Product Marketing (Progress Software)
    26. Cloudera and SAS: Leaders Coming Together - Jesse Luebert, Solutions Architect (SAS)
    27. Big Data Managed Services by CenturyLink - Avinash Gupta - VP Sales and Marketing, Andrew Clyne, VP Chief Data Officer, James Foppe, Sr. Engineer (CenturyLink)
    28. Accelerating Data Science delivery with DevOps - Aziz Shamim,Solutions Engineering Manager, Americas Central (Github)
    29. Converge Machine Learning, Streaming Analytics, and BI with a GPU-accelerated In-Memory Analytics Database - Manan Goel, Vice President of Products (Kinetica)
    30. Scaling Bi analytics for Hadoop cloud-based platforms - Priyank Patel, Co-founder Chief Product Officer (Arcadia Data)
    31. Strategies for organizations to survive and thrive with data science. - Cameron Sim, CEO (Crewspark)
  20. Business case studies
    1. Transamerica's journey to Customer 360 and beyond - Vishal Bamba (Transamerica), Rocky Tiwari (Transamerica)
    2. Delivering relevant filtered news to save hours of drudgery each day for fixed-income securities analysts - Alan Chaney (Bitvore Corp)
    3. Inside predictive intelligence, the powerful technology disrupting sales and marketing - Viral Bajaria (6Sense)
    4. From hours to milliseconds: How Verizon accelerated its mobile analytics - Todd Mostak (MapD), Abdul Subhan (Verizon Wireless)
    5. Saving lives with data: Identifying patients at risk of decline - Emily Spahn (ProKarma)
    6. The perfect conference: Using stochastic optimization to bring people together - Brian Lange (Datascope)
    7. The future of open data: Building businesses with a major national resource - Joel Gurin (Center for Open Data Enterprise)
    8. How Pinterest scaled to build the world’s catalog of 75+ billion ideas - Romit Jadhwani (Pinterest)
    9. A contextual real-time bidding engine for search engine marketing - Mahesh Goud T (Ticketmaster)
    10. Building a sustainable content ecosystem at Pinterest - Grace Huang (Pinterest)
  21. Strata Business Summit
    1. Executive Briefing: IoT and unconventional data - Teresa Tung (Accenture Labs)
    2. Executive Briefing: From data insights to action—Developing a data-driven company culture - Ashish Verma (Deloitte Consulting LLP)
    3. Executive Briefing: An executive’s guide to understanding advanced analytics in the cloud - Jerry Overton (CSC)
  22. Enterprise adoption
    1. FireEye's journey migrating 25 TB of RDBMS data to Hadoop - Ganesh Prabhu (FireEye), Vivek Agate (FireEye), Alex Rivlin (FireEye)
    2. Swipe, dip, and hover: Managing card payment data at Visa - Nandu Jayakumar (Visa), Rajesh Bhargava (Visa)
    3. Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop - Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera)
    4. Architecting an enterprise data hub in a 110-year-old company - Eric Richardson (American Chemical Society)
    5. Stream me up, Scotty: Transitioning to the cloud using a streaming data platform - Gwen Shapira (Confluent), Bob Lehmann (Monsanto)

Product information

  • Title: Strata + Hadoop World 2017 - San Jose, California
  • Author(s): O'Reilly Media, Inc.
  • Release date: March 2017
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491976159