Data Warehousing in the Age of Artificial Intelligence

Book description

Nearly 7,000 new mobile applications appear every day, and a constant stream of data gives them life. Many organizations rely on a predictive analytics model to turn data into useful business information and ensure the predictions remain accurate as data changes. It can be a complex, time-consuming process.

This book shows how to automate and accelerate that process using machine learning (ML) on a modern data warehouse that runs on any cloud. Product specialists from MemSQL explain how today’s modern data warehouses provide the foundations to implement ML algorithms that run efficiently.

Through several real-time use cases, you’ll learn how to quickly identify the right metrics to make actionable business decisions. This book explores foundational ML and artificial intelligence concepts to help you understand:

  • How data warehouses accelerate deployment and simplify manageability
  • How companies make a choice between cloud and on-premises deployments for building data processing applications
  • Ways to build analytics and visualizations for business intelligence on historical data
  • The technologies and architecture for building and deploying real-time data pipelines

This book demonstrates specific models and examples for building supervised and unsupervised real-time ML applications, and gives practical advice on how to make the choice between building an ML pipeline or buying an existing solution. If you need to use data accurately and efficiently, a real-time data warehouse is a critical business tool.

Table of contents

  1. 1. The Role of a Modern Data Warehouse in the Age of AI
    1. Actors: Run Business, Collect Data
      1. Applications Producing Data
      2. Enterprise Applications
    2. Operators: Analyze and Refine Operations
      1. Targeting the Appropriate Metric
      2. Accelerating Predictions with ML
    3. The Modern Data Warehouse for an ML Feedback Loop
      1. Dynamic Feedback Loop Between Actors and Operators
  2. 2. Framing Data Processing with ML and AI
    1. Foundations of ML and AI for Data Warehousing
      1. AI
      2. ML
      3. Deep Learning
    2. Practical Definitions of ML and Data Science
      1. The Emergence of Professional Data Science
      2. Developing and Deploying Models
      3. Automating Dynamic ML Systems
    3. Supervised ML
      1. Regression
      2. Classification
    4. Unsupervised ML
      1. Cluster Analysis
    5. Online Learning
    6. The Future of AI for Data Processing
      1. The Distributed Era
      2. Advantages of Distributed Datastores
      3. The Future of AI Augmented Datastores
  3. 3. The Data Warehouse Has Changed
    1. The Birth of the Data Warehouse
      1. New Performance, Limited Flexibility
    2. The Emergence of the Data Lake
    3. A New Class of Data Warehousing
  4. 4. The Path to the Cloud
    1. Cloud Is the New Datacenter
      1. Architectural Considerations for Cloud Computing
    2. Moving to the Cloud
      1. Cost Optimization
      2. Revenue Creation
    3. Choosing the Right Path to the Cloud
  5. 5. Historical Data
    1. Business Intelligence on Historical Data
      1. Scalable BI
      2. Query Optimization for Distributed Data Warehouses
    2. Delivering Customer Analytics at Scale
      1. Scale-Out Architecture
      2. Columnstore Query Execution
      3. Intelligent Data Distribution
    3. Examples of Analytics at the Largest Companies
      1. Rise of Data Capture for Analytics
      2. App Store Example
  6. 6. Building Real-Time Data Pipelines
    1. Technologies and Architecture to Enable Real-Time Data Pipelines
      1. High-Throughput Messaging Systems
      2. Data Transformation
      3. Operational Datastore
    2. Data Processing Requirements
      1. Memory Optimization
      2. Access to Real-Time and Historical Data
      3. Compiled Query Execution Plans
      4. Multiversion Concurrency Control
      5. Fault Tolerance and ACID Compliance
    3. Benefits from Batch to Real-Time Learning
  7. 7. Combining Real Time with Machine Learning
    1. Real-Time ML Scenarios
      1. Supervised and Unsupervised
      2. Continuous and Categorical
    2. Supervised Learning Techniques and Applications
      1. Regression
      2. Categorical: Classification
      3. Determining Whether a Data Point Belongs to a Class by Using Logistic Regression
    3. Unsupervised Learning Applications
      1. Continuous: Real-Time Clustering
      2. Categorical: Real-Time Unsupervised Classification with Neural Networks
  8. 8. Building the Ideal Stack for Machine Learning
    1. Example of an ML Data Pipeline
      1. New Data and Historical Data
      2. Model Training
      3. Scoring in Production
    2. Technologies That Power ML
      1. Programming Stack: R, Matlab, Python, and Scala
      2. Analytics Stack: Numpy/Scipy, TensorFlow, Theano, and MLlib
      3. Visualization Tools: Business Intelligence, Graphing Libraries, and Custom Dashboards
    3. Top Considerations
      1. Ingest Performance
      2. Analytics Performance
      3. Distributed Data Processing
  9. 9. Strategies for Ubiquitous Deployment
    1. Introduction to the Hybrid Cloud Model
      1. Single Application Stack
      2. Use Case-Centric
      3. Multicloud
    2. On-Premises Flexibility
    3. Hybrid Cloud Deployments
      1. High Availability and Disaster Recovery in the Cloud
      2. Test and Development
    4. Multicloud
    5. Charting an On-Premises-to-Cloud Security Plan
      1. Common Security Requirements
  10. 10. Real-Time Machine Learning Use Cases
    1. Overview of Use Cases
      1. Choosing the Correct Data Warehouse
    2. Energy Sector
      1. Goal: Anomaly Detection for the Internet of Things
      2. Approach: Real-Time Sensor Data to Manage Risk
      3. Goal: Take Control of Metering Equipment
      4. Approach: Use Predictive Analytics to Drive Efficiencies
      5. Implementation Outcomes
    3. Thorn
      1. Goal: Use Technology to Help End Child Sexual Exploitation
      2. Approach: ML Image Recognition to Identify Victims
      3. Implementation Outcomes
    4. Tapjoy
      1. Goal: Determine the Best Ads to Serve Based on Previous Behavior and Segmentation
      2. Approach: Real-Time Ad Optimization to Boost Revenue
      3. Implementation Outcomes
    5. Reference Architecture
      1. Datasets and Sample Queries
  11. 11. The Future of Data Processing for Artificial Intelligence
    1. Data Warehouses Support More and More ML Primitives
      1. Expressing More of ML Models in SQL, Pushing More Computation to the Database
      2. External ML Libraries/Frameworks Could Push Down Computation
      3. ML in Distributed Systems
    2. Toward Intelligent, Dynamic ML Systems

Product information

  • Title: Data Warehousing in the Age of Artificial Intelligence
  • Author(s): Gary Orenstein, Conor Doherty, Mike Boyarski, Eric Boutin
  • Release date: October 2017
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491997956