Tableau Prep: Up & Running

Book description

For self-service data preparation, Tableau Prep is relatively easy to use—as long as you know how to clean and organize your datasets. Carl Allchin, from The Information Lab in London, gets you up to speed on Tableau Prep through a series of practical lessons that include methods for preparing, cleaning, automating, organizing, and outputting your datasets.

Based on Allchin’s popular blog, Preppin’ Data, this practical guide takes you step-by-step through Tableau Prep’s fundamentals. Self-service data preparation reduces the time it takes to complete data projects and improves the quality of your analyses. Discover how Tableau Prep helps you access your data and turn it into valuable information.

  • Know what to look for when you prepare data
  • Learn which Tableau Prep functions to use when working with data fields
  • Analyze the shape and profile of your dataset
  • Output data for analysis and learn how Tableau Prep automates your workflow
  • Learn how to clean your dataset using Tableau Prep functions
  • Explore ways to use Tableau Prep techniques in real-world scenarios
  • Make your data available to others by managing and documenting the output

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why I Wrote This Book
    2. Who This Book Is For
    3. How This Book Is Organized
    4. Acknowledgments
    5. Conventions Used in This Book
    6. Using Code Examples
    7. O’Reilly Online Learning
    8. How to Contact Us
  2. 1. Why Self-Service Data Prep?
    1. A Short History of Self-Service Data Visualization
    2. Accessing the “Right Data”
    3. The Self-Service Data Preparation Opportunity
    4. Tableau Prep Up and Running
    5. Summary
  3. I. Getting Started
  4. 2. Getting Started with Tableau Prep Builder
    1. Where to Get Tableau Prep Builder
    2. How to Get a License for Prep Builder
    3. The Tableau Prep Builder Screen
    4. Basic Steps of Data Preparation
      1. Input Step
      2. Clean Step
      3. Output Step
      4. Saving a Flow
    5. Summary
  5. 3. Planning Your Prep
    1. Stage 1: Know Your Data
    2. Stage 2: Identify the Desired State
    3. Stage 3: Determine the Required Transitions from KYD to the Desired State
    4. Stage 4: Build the Workflow
    5. Summary
  6. 4. Shaping Data
    1. What to Look for in Incoming Data Sets
    2. What Shape Is Best for Analysis in Tableau?
    3. Changing Data Set Structures in Prep Builder
      1. Pivot
      2. Aggregate
      3. Join
      4. Union
    4. Applying Restructuring Techniques to the Ice Cream Example
      1. Step 1: Pivot Columns to Rows
      2. Step 2: Pivot Rows to Columns
    5. Summary
  7. 5. Connecting to Data in Files
    1. Files Upon Files Upon Files
      1. Spreadsheets
      2. Other File Types
    2. Where to Find Your Data Files
    3. How to Connect to Files in Prep
    4. Considerations for Saving Flows with File Inputs
    5. Summary
  8. 6. Connecting to a Database
    1. What Is a Database?
    2. How to Connect to a Database Within Prep Builder
    3. When to Avoid Connecting to a Database
    4. Summary
  9. II. Data Types
  10. 7. Dealing with Numbers
    1. What Do We Mean by Numbers?
    2. Types of Numbers
    3. Category or Measure?
    4. Aggregation
    5. Formatting Numbers
    6. Functions for Mastering Numerical Data
    7. Summary
  11. 8. Dealing with Dates
    1. Why Are Dates Important?
    2. Parts of a Date
    3. Date Lookup Tables
    4. Epoch Dates
    5. Excel Serial Number
    6. Entering Dates
      1. The makedate() Function
      2. The dateparse() Function
    7. Summary
  12. 9. Dealing with String Data
    1. What Do We Mean by Strings?
    2. How String Data Is Different
      1. Character Order
      2. Formatting Considerations
    3. Common Functions for Preparing String Data
    4. Grouping and Replace Options for Working with String Data
    5. Summary
  13. 10. Dealing with Boolean Data
    1. What Is Boolean Data?
      1. Why Is It So Useful in Data Analysis?
      2. Functions Featuring Boolean Logic
    2. Summary
  14. III. The Shape of Data
  15. 11. Profiling Data
    1. What Is a Profile?
    2. Why Visualizing the Data Set Is Important
      1. Anscombe’s Quartet
      2. Visualizations Versus Data Tables
    3. How Prep Builder Profiles Data
      1. Generating Histograms and Mini-Histograms
      2. Selecting Summary Versus Detail Views
      3. Highlighting Values
      4. Viewing Dimension Counts
    4. Sorting
    5. Summary
  16. 12. Sampling Data Sets
    1. One Simple Rule: Use It All If Possible
    2. Sampling to Work Around Technical Limitations
      1. Volume of Data
      2. Velocity of Data
    3. Other Reasons for Sampling
      1. Reduce Build Times
      2. Determine What You Need
    4. Sampling Techniques
      1. Fixed Number of Rows
      2. Random Sample
    5. When Not to Sample
    6. Summary
  17. 13. Pivoting Columns to Rows
    1. When to Pivot in Tableau Prep Builder
    2. How to Pivot Columns to Rows
    3. Summary
  18. 14. Pivoting Rows to Columns
    1. When to Use a Rows-to-Columns Pivot
    2. How to Pivot Rows to Columns
    3. Summary
  19. 15. Aggregating in Prep Builder
    1. Comparing Calculations in Prep Builder and Desktop
    2. Which Calculations in Prep Builder Differ?
    3. Adding the Aggregate Step
    4. Where’s the Rest of My Data?
    5. Level of Detail Calculation Option
    6. Summary
  20. 16. Joining Data Sets Together
    1. How to Join Data Sets in Prep Builder
    2. Join Logic and Terminology
    3. Types of Join in Prep Builder
    4. When to Use Each Join Type
    5. Summary
  21. 17. Unioning
    1. What Is a Union?
    2. What If the Data Structure Isn’t Identical?
    3. When to Union Data
      1. Monthly Data Sets
      2. Data Sets from Web Sources
      3. Company Mergers
    4. Multiple Tables and Wildcard Unions
    5. Summary
  22. 18. Calculations
    1. What Do Calculations Do in Data Preparation?
    2. Creating a Calculated Field
    3. Fundamentals of Calculations
      1. The Reference List
      2. Syntax
      3. Description
      4. Example
    4. Building the Calculation
      1. When Calculations Go Well
      2. When Calculations Go Poorly
      3. Editing Calculated Fields
      4. Recommendations
    5. Types of Calculations
      1. Numerical Calculations
      2. String Calculations
      3. Date Calculations
      4. Conditional Calculations with a Boolean Output
      5. Logical Calculations
      6. Type Conversions
    6. Level of Detail and Ranking Calculations
    7. Summary
  23. IV. Output
  24. 19. Choosing an Output
    1. Types of Output
      1. Publish to Files
      2. Publish to Tableau Server
    2. When to Output Data in Prep Builder
      1. Outputting Data in the Output Step
      2. Previewing Output Data in Desktop
    3. Other Considerations for Output Data
    4. Summary
  25. 20. Outputting to a Database
    1. When to Write to a Database
      1. Clean Data
      2. Simplified Joins
      3. Staging and Reference Tables
    2. Setup for Writing to a Database
    3. What to Watch Out For
    4. Summary
  26. 21. Getting Started with Tableau Prep Conductor
    1. When to Use Prep Conductor
    2. How to Get Prep Conductor
    3. Loading a Flow to Prep Conductor
    4. Other Benefits of Using Prep Conductor
    5. Summary
  27. V. Cleaning Data
  28. 22. Creating Additional Data
    1. When Not to Create Data
      1. Dynamic Calculations in Desktop
      2. Duplicate Records from Joins
    2. Creating Additional Columns
      1. Using Calculations
      2. Pivoting Rows to Columns
      3. Joining Data Sets
    3. Creating Additional Rows
      1. Pivoting Columns to Rows
      2. Unioning Data Sets
      3. Scaffolding Data Sets
      4. Joining Data Sets
    4. Summary
  29. 23. Filtering
    1. What Is a Filter?
    2. Different Types of Filters
      1. Selection
      2. Calculation
      3. Wildcard
      4. Null Values
    3. When to Filter Out Columns
    4. When to Filter Out Rows
    5. Summary
  30. 24. Removing Data During Input
    1. Changing Your Data Set Before Loading It
    2. Slow Performance, Slow Build, Slow Output
    3. Removing Columns
    4. Removing Records
    5. Summary
  31. 25. Splitting Data Fields
    1. Basic Splits
    2. Advanced Splits: When Automatic Splits Don’t Work as Intended
    3. When Not to Split Data
      1. Address Data
      2. No Clear Delimiter
    4. Summary
  32. 26. Cleaning by Grouping Data
    1. What Does Grouping Mean?
    2. Why Use Grouping
      1. Improving Accuracy
      2. Navigating the Data Hierarchy
      3. Smoothing Reorganizations
    3. Grouping Techniques
      1. Manual
      2. Calculations
      3. Built-in Functionality
    4. Summary
  33. 27. Dealing with Nulls
    1. What Is a Null?
    2. When Is a Null OK?
    3. How to Remove or Replace a Null
      1. ISNULL()
      2. ZN()
      3. Merge
    4. Summary
  34. 28. Using Data Roles
    1. How to Use Data Roles
    2. Custom Data Roles
    3. Summary
  35. 29. Dealing with Unwanted Characters
    1. What Is an Unwanted Character?
    2. Issues Caused by Unwanted Characters
    3. Removing Unwanted Characters
      1. Strings with Mistyped Characters
      2. Numbers with Unwanted Characters
      3. Dates with Mistyped Characters
    4. Summary
  36. 30. Deduplicating
    1. How to Identify Duplicates
    2. Causes of Duplicates
      1. System Loads
      2. Row per Measure
      3. Joins
    3. How to Handle Duplicates
      1. Aggregating: Technique 1
      2. Aggregating: Technique 2
      3. Pivoting Rows to Columns
    4. Summary
  37. 31. Using Regular Expressions
    1. What Are Regular Expressions?
    2. How to Use Regexes in Prep
    3. REGEXP_EXTRACT() and REGEXP_EXTRACT_NTH()
      1. REGEXP_MATCH()
      2. REGEXP_REPLACE()
    4. Regex Use Cases
      1. Replacing Common Mistakes
      2. Anonymizing Comments or Feedback
    5. Common Regex Commands
    6. Summary
  38. 32. Completing Advanced Joins
    1. Multiple Join Conditions
    2. Join Conditions Other Than Equals
      1. Filtering with a Join
      2. Joining by a Range
    3. OR Statements
    4. Summary
  39. 33. Creating Level of Detail Calculations
    1. What Is Appending?
    2. Exploring Appending Through LOD Calculations
      1. When to Use an LOD Calculation
      2. How to Write an LOD Calculation in Prep Builder
      3. What a Level of Detail Calculation Is Doing
    3. Summary
  40. 34. Doing Analytical Calculations
    1. What Is a Table Calculation?
    2. Applying Table Calculation Logic in Prep Builder
      1. Keywords
      2. Analytical Calculations
    3. Use Cases
      1. Filtering for the Top N
      2. Filtering Out a Percentage of Data
    4. Summary
  41. VI. Beyond the Basics
  42. 35. Breaking Down Complex Data Preparation Challenges
    1. The Challenge
    2. Where to Begin
    3. Logical Steps
    4. Making Changes
    5. Be Ready to Iterate
    6. Summary
  43. 36. Handling Free Text
    1. What Is Free Text?
    2. Why Is Free Text Useful?
    3. How to Analyze Free Text in Tableau
      1. Split the Strings
      2. Pivot Columns to Rows
      3. Clean Cases and Punctuation
      4. Use a Join to Remove Common Words
      5. Group the Remaining Values
    4. Summary
  44. 37. Using Smarter Filtering
    1. Calculations
      1. Boolean Calculations
      2. Logical Calculations
      3. Regex Calculations
    2. Join Ranges
    3. Percentage Variance
      1. Manual Entry: Level of Detail Calculations
      2. Reloaded Data: Join to Previous Output
      3. Aggregating the Average Production Cost per Type
      4. Joining the Data Sets Together
    4. Combining Techniques
    5. Summary
  45. 38. Managing Conversion Rates
    1. Challenges of Conversion Rates
    2. Applying Conversion Rates in Prep
      1. Step 1: Create a Consistent Granularity of Data for the Conversion
      2. Step 2: Join the Data Sets Together
      3. Step 3: Apply the Conversion Rate
    3. Long-Term Strategies for Conversion Rates
      1. Managing Frequency
      2. Maintaining History Tables
    4. Summary
  46. 39. Scaffolding Your Data
    1. What Is Scaffolding?
    2. Challenges Addressed by Scaffolding
    3. Challenges Created by Scaffolding
    4. The Traditional Scaffolding Technique
      1. Step 1: Input the Data Sets
      2. Step 2: Build the Join Calculations
      3. Step 3: Join the Two Data Sets Together
      4. Step 4: Filter Out Unnecessary Rows
    5. The Newer Scaffolding Technique
      1. Step 1: Input the Data Sets
      2. Step 2: Join the Data Sets
      3. Step 3: Add the Reporting Date
      4. Step 4: Remove the Scaffold Value
    6. The Result
    7. Summary
  47. 40. Connecting to Programming Scripts
    1. When to Use the Script Step in Prep
    2. Setting Up Your Computer to Use Scripts in Prep
    3. Using a Script Step
    4. Summary
  48. 41. Handling Prep Builder Errors
    1. Parameter Errors
    2. Blank Profile Panes or Data Panes
      1. Changing a Calculation or Removing a Data Field Downstream
      2. The Data Source Has Changed
    3. Errors Within a Calculated Field
      1. Incomplete Calculations
      2. Unsupported Functions
    4. Summary
  49. VII. Managing Your Data
  50. 42. Documenting Your Data Preparation
    1. Basic Documentation
      1. Folder Structure
      2. Filenames
      3. Data Sources
      4. Output
    2. Step Names
    3. Clean Step
    4. Step Descriptions
    5. Color
    6. Joins
    7. Unions
    8. Summary
  51. 43. Deciding Where to Prepare Your Data
    1. Processes to Consider
    2. Data Preparation Versus Visual Analytics
      1. Data Literacy
      2. Organization Size
      3. Quality of Technological Hardware
      4. History of Data Investment
    3. Software Performance
      1. Sampling
      2. Functionality
      3. Documentation
    4. Summary
  52. 44. Managing Data
    1. What Is Sensitive Data?
      1. Public
      2. Confidential
      3. Strictly Confidential
      4. Restricted
    2. Managing Data Based on Sensitivity
    3. Production Versus Development Environments
    4. Deleting Data
      1. When Data Becomes Outdated or Irrelevant
      2. When a Customer or Client Leaves
    5. Summary
  53. 45. Storing Your Data
    1. Inaccessibility
      1. Don’t Break the Law
      2. Don’t Delete Operational Data
      3. Do Grant Access to Data for the Experts
      4. Do Document Your Sources
    2. Slow/Unresponsive Performance
    3. Overwriting Risks
      1. Grant Read-Only Access
      2. Train Before Publishing
    4. So, Where Do You Write That Output?
    5. Summary
  54. 46. Using Identifiers and Keys in Data
    1. What Is an Identifier?
    2. What Is a Key in a Database?
    3. Using Keys and Identifiers in Prep
    4. Creating Identifier Data Fields in Prep Builder
    5. Summary
  55. 47. Keeping Your Data Up-to-Date
    1. Refreshing Data
    2. Full Versus Incremental Refreshes
    3. Setting Up Different Types of Refresh
      1. Full Refresh
      2. Incremental Refresh
    4. What to Watch Out for When Refreshing Data Sources
      1. Changing Data Values
      2. Altering the Structure of Sources
      3. New Data, New Input
    5. Summary
  56. 48. Using History Tables
    1. Why Are History Tables Required?
    2. What to Consider When Creating History Tables
      1. Ability to Join to Live Data
      2. Relevance of Information
      3. Frequency of Updates
      4. Level of Granularity
    3. Performance
    4. Data Regulations
    5. An Example History Table
    6. Summary
  57. 49. Evaluating Whether You Need Prep Builder at All
    1. A History of Data Preparation in Tableau
    2. Where to Try Desktop First
      1. Simple Joins
      2. Unions
      3. Single Pivots
    3. Where to Start with Prep Builder
    4. Summary
  58. 50. Final Thoughts
  59. Index

Product information

  • Title: Tableau Prep: Up & Running
  • Author(s): Carl Allchin
  • Release date: September 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492079620