Agile Data Science 2.0
Building Full-Stack Data Analytics Applications with Spark
Publisher: O'Reilly Media
Final Release Date: September 2016
Pages: 325

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You’ll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.

Building analytics products at scale requires a deep investment in people, machines, and time. How can you be sure you’re building the right models that people will pay for? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Spark.

Using lightweight tools such as Python, PySpark, Elastic MapReduce, MongoDB, ElasticSearch, Doc2vec, Deep Learning, D3.js, Leaflet, Docker and Heroku, your team will create an agile environment for exploring data, starting with an example application to mine flight data into an analytic product. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working applications.

  • Create analytics applications by using the Agile Data Science development methodology
  • Build value from your data in a series of agile sprints, using the data-value pyramid
  • Learn how to build and deploy predictive analytics using Kafka and Spark Streaming
  • Extract features for statistical models from a single dataset
  • Visualize data with charts, and expose different aspects through interactive reports
  • Use historical data to predict the future via classification and regression
  • Translate predictions into actions
  • Get feedback from users after each sprint to keep your project on track
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyAgile Data Science 2.0
 
5.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (0)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Displaying reviews 1-2

Back to top

(1 of 3 customers found this review helpful)

 
5.0

Best book I've ever written

By Russell the Jurney

from Pacifica, CA

About Me Author, Developer, Educator

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice

    Comments about oreilly Agile Data Science 2.0:

    I am very proud of this book. It has 200 new pages, and every page was rewritten. The theory chapter is greatly expanded and now constitutes a brief introduction to an agile methodology. The book is updated with PySpark, Spark SQL, Spark MLlib, Spark Streaming and Kafka.

     
    5.0

    Approachable and Clear

    By Jay

    from Atlanta, GA

    About Me Developer

    Verified Reviewer

    Pros

    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice
      • Student

      Comments about oreilly Agile Data Science 2.0:

      This book is really approachable. The examples are clear and really helped me understand ways to analyze large datasets.

      Displaying reviews 1-2

      Back to top

       
      Buy 2 Get 1 Free Free Shipping Guarantee
      Buying Options
      Immediate Access - Go Digital what's this?
      Pre-Order  Print:  $44.99
      May 2017 (est.)