Data Science at the Command Line
Facing the Future with Time-Tested Tools
Publisher: O'Reilly Media
Final Release Date: June 2014
Pages: 208

With Early Release ebooks, you get books in their earliest form — the author's raw and unedited content as he or she writes — so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle

In this practical guide, you’ll learn how to leverage the power of the command line for doing data science. By combining small, yet powerful, command-line tools, you can quickly obtain, scrub, explore, and model your data. Even if you’re already comfortable processing data with R or Python, being able to integrate the command line into your existing workflow will make you a more efficient and productive data scientist.

  • Learn essential concepts and built-in commands of the *nix command line
  • Get started with your own Data Science Toolbox on either Linux, Mac OS X, or Microsoft Windows
  • Use classic command-line tools such as grep, sed, and awk
  • Obtain data from websites, APIs, databases, and spreadsheets
  • Parallelize and distribute data-intensive pipelines to remote machines, including AWS EC2
  • Clean data in CSV, JSON, and XML/HTML formats using csvkit, and jq, and scrape
  • Apply dimensionality reduction, clustering, regression, and classification algorithms
  • Visualize data and results from the command line using gnuplot and ggplot
  • Turn Bash one-liners and existing Python and R code into reusable command-line tools
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Science at the Command Line
 
4.6

(based on 5 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Accurate (4)
  • Concise (4)
  • Easy to understand (4)
  • Helpful examples (4)
  • Well-written (4)

Cons

    Best Uses

    • Intermediate (5)
    • Expert (3)
      • Reviewer Profile:
      • Developer (3)

    Reviewed by 5 customers

    Sort by

    Displaying reviews 1-5

    Back to top

     
    5.0

    Unleash the power of the command line

    By Brian

    from Arlington, VA

    About Me Data Scientist

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice

      Comments about oreilly Data Science at the Command Line:

      As a data scientist new to the field, this book was an invaluable resource in grasping some really important skills I had seen but hadn't used. Jeroen is clear, concise, and has given me a new perspective on my field. A must read.

       
      4.0

      First of its kind

      By Anthony Georgilas

      from Kalamata, Hellas

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate

        Comments about oreilly Data Science at the Command Line:

        As far as I know, most books on Data Analysis are focused on using R or similar statistical software. I think it was about time for a different approach. One can do more work done at the command line in less time. Thanks to Mr. Janssens, we have a reference to most of the useful tools for data analysis out there. I'm not an expert Terminal user (I've been using graphical OSes since the early nineties) and, to be honest, I didn't know how much power is hidden in it. My only complaint is that, for the time being, VirtualBox is needed for easy installation of all the tools mentioned in the book. As a Mac user, it would be nice to have a straightforward way to install the tools, but I'm sure Jeroen will fix this.

        That said, I would like to express my appreciation to the author.

        (1 of 1 customers found this review helpful)

         
        4.0

        Fast and powerful,

        By Clive

        from Johannesburg, RSA

        About Me Developer, Qa

        Verified Buyer

        Pros

        • Concise
        • Easy to understand

        Cons

          Best Uses

          • Expert
          • Intermediate
          • Novice
          • Student

          Comments about oreilly Data Science at the Command Line:

          I love using command lines tool like curl, any tips and tricks are priceless. the command line tools out there are fabulous, and this book can only add to the data scientists repertoire.
          the scrips are reusable, take minutes to construct, and easy for others in the project to read understand and use.

          (1 of 1 customers found this review helpful)

           
          5.0

          Review of the completed book

          By Vijay N Phadke

          from Fremont, CA

          About Me Developer

          Verified Reviewer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Expert
            • Intermediate

            Comments about oreilly Data Science at the Command Line:

            Recently I got an updated copy of the book which is now almost a finished product except a couple of chapters.
            The book now has excellent content throughout covering all aspects of data manipulation solely using command line. I found chapter 8 Parallel Pipelines quite interesting which shows how to do distributed data processing on remote machines employing gnu parallel, almost a la Hadoop but without the complexity! This could be real benefit to tasks that just don't have the luxury of using Hadoop.
            The book now deserves 5 stars for making data wrangling on command line cool again!

            (2 of 2 customers found this review helpful)

             
            5.0

            This is how I work

            By David Huttleston Jr

            from Madison WI

            About Me Developer

            Verified Buyer

            Pros

            • Accurate
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Expert
              • Intermediate

              Comments about oreilly Data Science at the Command Line:

              This is fabulous guide to how get Data Science *done*. The commandline is intensely productive for data work-flows. And, data science is all about work-flow. There is almost nothing that is done with data that doesn't have to be done many times. I personally use many of these techniques everyday and it's delightful to see someone else's take on things. I've already picked up many new tricks and tools.
              Many thanks to the author!

              Displaying reviews 1-5

              Back to top

               
              Buy 2 Get 1 Free Free Shipping Guarantee
              Buying Options
              Immediate Access - Go Digital what's this?
              Pre-Order  Print: $39.99
              October 2014 (est.)