Data Visualization with Python and JavaScript

Book description

Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations.

As a working example, throughout the book Dale walks you through transforming Wikipedia’s table-based list of Nobel Prize winners into an interactive visualization. You’ll examine steps along the entire toolchain, from scraping, cleaning, exploring, and delivering data to building the visualization with JavaScript’s D3 library. If you’re ready to create your own web-based data visualizations—and know either Python or JavaScript— this is the book for you.

  • Learn how to manipulate data with Python
  • Understand the commonalities between Python and JavaScript
  • Extract information from websites by using Python’s web-scraping tools, BeautifulSoup and Scrapy
  • Clean and explore data with Python’s Pandas, Matplotlib, and Numpy libraries
  • Serve data and create RESTful web APIs with Python’s Flask framework
  • Create engaging, interactive web visualizations with JavaScript’s D3 library

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Safari
    4. How to Contact Us
    5. Acknowledgments
  2. Introduction
    1. Who This Book Is For
      1. Minimal Requirements to Use This Book
    2. Why Python and JavaScript?
      1. Why Not Python on the Browser?
      2. Why Python for Data Processing
      3. Python’s Getting Better All the Time
    3. What You’ll Learn
      1. The Choice of Libraries
      2. Preliminaries
    4. The Dataviz Toolchain
      1. 1. Scraping Data with Scrapy
      2. 2. Cleaning Data with Pandas
      3. 3. Exploring Data with Pandas and Matplotlib
      4. 4. Delivering Your Data with Flask
      5. 5. Transforming Data into Interactive Visualizations with D3
      6. Smaller Libraries
    5. Using the Book
    6. A Little Bit of Context
    7. Summary
    8. Recommended Books
  3. 1. Development Setup
    1. The Accompanying Code
    2. Python
      1. Anaconda
      2. Checking the Anaconda Install
      3. Installing Extra Libraries
      4. Virtual Environments
    3. JavaScript
      1. Content Delivery Networks
      2. Installing Libraries Locally
    4. Databases
      1. Installing MongoDB
    5. Integrated Development Environments
    6. Summary
  4. I. Basic Toolkit
  5. 2. A Language-Learning Bridge Between Python and JavaScript
    1. Similarities and Differences
    2. Interacting with the Code
      1. Python
      2. JavaScript
    3. Basic Bridge Work
      1. Style Guidelines, PEP 8, and use strict
      2. CamelCase Versus Underscore
      3. Importing Modules, Including Scripts
      4. Keeping Your Namespaces Clean
      5. Outputting “Hello World!”
      6. Simple Data Processing
      7. String Construction
      8. Significant Whitespace Versus Curly Brackets
      9. Comments and doc-strings
      10. Declaring Variables, var
      11. Strings and Numbers
      12. Booleans
      13. Data Containers: Dicts, Objects, Lists, Arrays
      14. Functions
      15. Iterating: for Loops and Functional Alternatives
      16. Conditionals: if, else, elif, switch
      17. File Input and Output
      18. Classes and Prototypes
    4. Differences in Practice
      1. Method Chaining
      2. Enumerating a List
      3. Tuple Unpacking
      4. Collections
      5. Underscore
      6. Functional Array Methods and List Comprehensions
      7. Map, Reduce, and Filter with Python’s Lambdas
      8. JavaScript Closures and the Module Pattern
      9. This Is That
    5. A Cheat Sheet
    6. Summary
  6. 3. Reading and Writing Data with Python
    1. Easy Does It
    2. Passing Data Around
    3. Working with System Files
    4. CSV, TSV, and Row-Column Data Formats
    5. JSON
      1. Dealing with Dates and Times
    6. SQL
      1. Creating the Database Engine
      2. Defining the Database Tables
      3. Adding Instances with a Session
      4. Querying the Database
      5. Easier SQL with Dataset
    7. MongoDB
    8. Dealing with Dates, Times, and Complex Data
    9. Summary
  7. 4. Webdev 101
    1. The Big Picture
    2. Single-Page Apps
    3. Tooling Up
      1. The Myth of IDEs, Frameworks, and Tools
      2. A Text-Editing Workhorse
      3. Browser with Development Tools
      4. Terminal or Command Prompt
    4. Building a Web Page
      1. Serving Pages with HTTP
      2. The DOM
      3. The HTML Skeleton
      4. Marking Up Content
      5. CSS
      6. JavaScript
      7. Data
    5. Chrome’s Developer Tools
      1. The Elements Tab
      2. The Sources Tab
      3. Other Tools
    6. A Basic Page with Placeholders
      1. Filling the Placeholders with Content
    7. Scalable Vector Graphics
      1. The <svg> Element
      2. The <g> Element
      3. Circles
      4. Applying CSS Styles
      5. Lines, Rectangles, and Polygons
      6. Text
      7. Paths
      8. Scaling and Rotating
      9. Working with Groups
      10. Layering and Transparency
      11. JavaScripted SVG
    8. Summary
  8. II. Getting Your Data
  9. 5. Getting Data off the Web with Python
    1. Getting Web Data with the requests Library
    2. Getting Data Files with requests
    3. Using Python to Consume Data from a Web API
      1. Using a RESTful Web API with requests
      2. Getting Country Data for the Nobel Dataviz
    4. Using Libraries to Access Web APIs
      1. Using Google Spreadsheets
      2. Using the Twitter API with Tweepy
    5. Scraping Data
      1. Why We Need to Scrape
      2. BeautifulSoup and lxml
      3. A First Scraping Foray
    6. Getting the Soup
    7. Selecting Tags
      1. Crafting Selection Patterns
      2. Caching the Web Pages
      3. Scraping the Winners’ Nationalities
    8. Summary
  10. 6. Heavyweight Scraping with Scrapy
    1. Setting Up Scrapy
    2. Establishing the Targets
    3. Targeting HTML with Xpaths
      1. Testing Xpaths with the Scrapy Shell
      2. Selecting with Relative Xpaths
    4. A First Scrapy Spider
    5. Scraping the Individual Biography Pages
    6. Chaining Requests and Yielding Data
      1. Caching Pages
      2. Yielding Requests
    7. Scrapy Pipelines
    8. Scraping Text and Images with a Pipeline
      1. Specifying Pipelines with Multiple Spiders
    9. Summary
  11. III. Cleaning and Exploring Data with Pandas
  12. 7. Introduction to NumPy
    1. The NumPy Array
      1. Creating Arrays
      2. Array Indexing and Slicing
      3. A Few Basic Operations
    2. Creating Array Functions
      1. Calculating a Moving Average
    3. Summary
  13. 8. Introduction to Pandas
    1. Why Pandas Is Tailor-Made for Dataviz
    2. Why Pandas Was Developed
    3. Heterogeneous Data and Categorizing Measurements
    4. The DataFrame
      1. Indices
      2. Rows and Columns
      3. Selecting Groups
    5. Creating and Saving DataFrames
      1. JSON
      2. CSV
      3. Excel Files
      4. SQL
      5. MongoDB
    6. Series into DataFrames
    7. Panels
    8. Summary
  14. 9. Cleaning Data with Pandas
    1. Coming Clean About Dirty Data
    2. Inspecting the Data
    3. Indices and Pandas Data Selection
      1. Selecting Multiple Rows
    4. Cleaning the Data
      1. Finding Mixed Types
      2. Replacing Strings
      3. Removing Rows
      4. Finding Duplicates
      5. Sorting Data
      6. Removing Duplicates
      7. Dealing with Missing Fields
      8. Dealing with Times and Dates
    5. The Full clean_data Function
    6. Saving the Cleaned Dataset
      1. Merging DataFrames
    7. Summary
  15. 10. Visualizing Data with Matplotlib
    1. Pyplot and Object-Oriented Matplotlib
    2. Starting an Interactive Session
    3. Interactive Plotting with Pyplot’s Global State
      1. Configuring Matplotlib
      2. Setting the Figure’s Size
      3. Points, Not Pixels
      4. Labels and Legends
      5. Titles and Axes Labels
      6. Saving Your Charts
    4. Figures and Object-Oriented Matplotlib
      1. Axes and Subplots
    5. Plot Types
      1. Bar Charts
      2. Scatter Plots
    6. Seaborn
      1. FacetGrids
      2. Pairgrids
    7. Summary
  16. 11. Exploring Data with Pandas
    1. Starting to Explore
    2. Plotting with Pandas
    3. Gender Disparities
      1. Unstacking Groups
      2. Historical Trends
    4. National Trends
      1. Prize Winners per Capita
      2. Prizes by Category
      3. Historical Trends in Prize Distribution
    5. Age and Life Expectancy of Winners
      1. Age at Time of Award
      2. Life Expectancy of Winners
      3. Increasing Life Expectancies over Time
    6. The Nobel Diaspora
    7. Summary
  17. IV. Delivering the Data
  18. 12. Delivering the Data
    1. Serving the Data
      1. Organizing Your Flask Files
      2. Serving Data with Flask
    2. Delivering Static Files
    3. Dynamic Data with Flask
      1. A Simple RESTful API with Flask
    4. Using Static or Dynamic Delivery
    5. Summary
  19. 13. RESTful Data with Flask
    1. A RESTful, MongoDB API with Eve
      1. Using AJAX to Access the API
    2. Delivering Data to the Nobel Prize Visualization
    3. RESTful SQL with Flask-Restless
      1. Creating the API
      2. Adding CORS Support
      3. Querying the API
    4. Summary
  20. V. Visualizing Your Data with D3
  21. 14. Imagining a Nobel Visualization
    1. Who Is It For?
    2. Choosing Visual Elements
    3. Menu Bar
    4. Prizes by Year
    5. A Map Showing Selected Nobel Countries
    6. A Bar Chart Showing Number of Winners by Country
    7. A List of the Selected Winners
      1. A Mini-Biography Box with Picture
    8. The Complete Visualization
    9. Summary
  22. 15. Building a Visualization
    1. Preliminaries
      1. Core Components
      2. Organizing Your Files
      3. Serving the Data
    2. The HTML Skeleton
    3. CSS Styling
    4. The JavaScript Engine
      1. Importing the Scripts
      2. Basic Data Flow
      3. The Core Code
      4. Initializing the Nobel Prize Visualization
      5. Ready to Go
      6. Data-Driven Updates
      7. Filtering Data with Crossfilter
    5. Running the Nobel Prize Visualization App
    6. Summary
  23. 16. Introducing D3—The Story of a Bar Chart
    1. Framing the Problem
    2. Working with Selections
    3. Adding DOM Elements
    4. Leveraging D3
    5. Measuring Up with D3’s Scales
      1. Quantitative Scales
      2. Ordinal Scales
    6. Unleashing the Power of D3 with Data Binding
    7. The enter Method
    8. Accessing the Bound Data
    9. The Update Pattern
    10. Axes and Labels
    11. Transitions
    12. Summary
  24. 17. Visualizing Individual Prizes
    1. Building the Framework
    2. Scales
    3. Axes
    4. Category Labels
    5. Nesting the Data
    6. Adding the Winners with a Nested Data-Join
    7. A Little Transitional Sparkle
    8. Summary
  25. 18. Mapping with D3
    1. Available Maps
    2. D3’s Mapping Data Formats
      1. GeoJSON
      2. TopoJSON
      3. Converting Maps to TopoJSON
    3. D3 Geo, Projections, and Paths
      1. Projections
      2. Paths
      3. Graticules
    4. Putting the Elements Together
    5. Updating the Map
    6. Adding Value Indicators
    7. Our Completed Map
    8. Building a Simple Tooltip
    9. Summary
  26. 19. Visualizing Individual Winners
    1. Building the List
    2. Building the Bio-Box
    3. Summary
  27. 20. The Menu Bar
    1. Creating HTML Elements with D3
    2. Building the Menu Bar
      1. Building the Category Selector
      2. Adding the Gender Selector
      3. Adding the Country Selector
      4. Wiring Up the Metric Radio Button
    3. Summary
  28. 21. Conclusion
    1. Recap
      1. Part I, Basic Toolkit
      2. Part II, Getting Your Data
      3. Part III, Cleaning and Exploring Data with Pandas
      4. Part IV, Delivering the Data
      5. Part V, Visualizing Your Data with D3
    2. Future Progress
      1. Visualizing Social Media Networks
      2. Interactive Mapping with Leaflet and Folium
      3. Machine-Learning Visualizations
    3. Final Thoughts
  29. A. Moving from Development to Production
    1. The Starting Directory
    2. Configuration
      1. Configuring Flask
      2. Configuring the JavaScript App
    3. Authentication
    4. Testing Flask Apps
    5. Testing JavaScript Apps
    6. Deploying Flask Apps
      1. Configuring Apache
    7. Logging and Error Handling
  30. Index

Product information

  • Title: Data Visualization with Python and JavaScript
  • Author(s): Kyran Dale
  • Release date: July 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491920510