Search Inside and Read Larger Cover Agile Data Science 2.0 Building Full-Stack Data Analytics Applications with Spark By Publisher: O'Reilly Media Final Release Date: June 2017 Pages: 352 Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid

Extract features for statistical models from a single dataset

Visualize data with charts, and expose different aspects through interactive reports

Use historical data to predict the future via classification and regression

Translate predictions into actions

Get feedback from users after each sprint to keep your project on track Setup Chapter 1 Theory Introduction Definition The Problem with the Waterfall The Problem with Agile Software The Data Science Process Notes on Process Chapter 2 Agile Tools Scalability = Simplicity Agile Data Science Data Processing Local Environment Setup EC2 Environment Setup Getting and Running the Code Touring the Toolset Conclusion Chapter 3 Data Air Travel Data Weather Data Data Processing in Agile Data Science SQL Versus NoSQL Conclusion Climbing the Pyramid Chapter 4 Collecting and Displaying Records Putting It All Together Collecting and Serializing Flight Data Processing and Publishing Flight Records Presenting Flight Records in a Browser Agile Checkpoint Listing Flights Searching for Flights Conclusion Chapter 5 Visualizing Data with Charts and Tables Chart Quality: Iteration Is Essential Scaling a Database in the Publish/Decorate Model Exploring Seasonality Extracting Metal (Airplanes [Entities]) Data Enrichment Conclusion Chapter 6 Exploring Data with Reports Extracting Airlines (Entities) Curating Ontologies of Semi-structured Data Improving Airlines Investigating Airplanes (Entities) Conclusion Chapter 7 Making Predictions The Role of Predictions Predict What? Introduction to Predictive Analytics Exploring Flight Delays Extracting Features with PySpark Building a Regression with scikit-learn Building a Classifier with Spark MLlib Conclusion Chapter 8 Deploying Predictive Systems Deploying a scikit-learn Application as a Web Service Deploying Spark ML Applications in Batch with Airflow Deploying Spark ML via Spark Streaming Conclusion Chapter 9 Improving Predictions Fixing Our Prediction Problem When to Improve Predictions Improving Prediction Performance Incorporating Airplane Data Incorporating Flight Time Conclusion Appendix Manual Installation Installing Hadoop Installing Spark Installing MongoDB Installing the MongoDB Java Driver Installing mongo-hadoop Installing Elasticsearch Installing Elasticsearch for Hadoop Setting Up Our Spark Environment Installing Kafka Installing scikit-learn Installing Zeppelin

