Data Science at the Command Line
Facing the Future with Time-Tested Tools
Publisher: O'Reilly Media
Final Release Date: September 2014
Pages: 212

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on plain text, CSV, HTML/XML, and JSON
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow using Drake
  • Create reusable tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines using GNU Parallel
  • Model data with dimensionality reduction, clustering, regression, and classification algorithms
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Science at the Command Line
 
4.6

(based on 8 reviews)

Ratings Distribution

  • 5 Stars

     

    (5)

  • 4 Stars

     

    (3)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (7)
  • Well-written (7)
  • Accurate (6)
  • Concise (6)
  • Easy to understand (6)

Cons

    Best Uses

    • Intermediate (8)
    • Expert (4)
    • Novice (4)
      • Reviewer Profile:
      • Developer (6)

    Reviewed by 8 customers

    Sort by

    Displaying reviews 1-8

    Back to top

    (1 of 1 customers found this review helpful)

     
    4.0

    great book

    By rvjansen

    from Amsterdam

    About Me Designer, Developer, Maker, Sys Admin

    Verified Buyer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice

      Comments about oreilly Data Science at the Command Line:

      I like everything in this book, as it shows how the basic unix philosophy of chaining pipes of text is the most valuable paradigm still if any reasonable amount of work needs to be done.

      I have a problem with the PDF in Mac Preview, though. This might be a bug in Yosemite Preview, but this is still the only file that triggers it. The front picture distorts, and my laptop crawls after that, with spinning beachballs. Maybe someone ought to review the technical merits of this PDF. Please inform me if there is a new version.

       
      5.0

      Really helps to change our mindset

      By Seb Portebois

      from Montréal, QC, Canada

      About Me Developer

      Verified Buyer

      Pros

      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate

        Comments about oreilly Data Science at the Command Line:

        I am not a data-scientist, but I do a lot of data treatment, and already used the command line for that.
        Data Science at the Command Line really helped my to go ahead one big step further and helped my improve my skillset and change my mindset.

        (0 of 1 customers found this review helpful)

         
        5.0

        Excellent

        By Steven Pennebaker

        from Soquel, CA

        About Me Developer

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate
          • Novice

          Comments about oreilly Data Science at the Command Line:

          This is an excellent book. Thorough and clear, it has enough basic information for beginners but even intermediate and advanced users will pick up plenty of new tricks. When I've had to solve these types of problems in the past, I've leaned pretty heavily on AWK and, to a lesser extent, XSL (!). This book introduced me to a bunch of utilities that were new to me and reminded me of a few old friends I haven't used in years.

          (1 of 1 customers found this review helpful)

           
          5.0

          Unleash the power of the command line

          By Brian

          from Arlington, VA

          About Me Data Scientist

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate
            • Novice

            Comments about oreilly Data Science at the Command Line:

            As a data scientist new to the field, this book was an invaluable resource in grasping some really important skills I had seen but hadn't used. Jeroen is clear, concise, and has given me a new perspective on my field. A must read.

            (1 of 1 customers found this review helpful)

             
            4.0

            First of its kind

            By Anthony Georgilas

            from Kalamata, Hellas

            Verified Reviewer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Intermediate

              Comments about oreilly Data Science at the Command Line:

              As far as I know, most books on Data Analysis are focused on using R or similar statistical software. I think it was about time for a different approach. One can do more work done at the command line in less time. Thanks to Mr. Janssens, we have a reference to most of the useful tools for data analysis out there. I'm not an expert Terminal user (I've been using graphical OSes since the early nineties) and, to be honest, I didn't know how much power is hidden in it. My only complaint is that, for the time being, VirtualBox is needed for easy installation of all the tools mentioned in the book. As a Mac user, it would be nice to have a straightforward way to install the tools, but I'm sure Jeroen will fix this.

              That said, I would like to express my appreciation to the author.

              (1 of 1 customers found this review helpful)

               
              4.0

              Fast and powerful,

              By Clive

              from Johannesburg, RSA

              About Me Developer, Qa

              Verified Buyer

              Pros

              • Concise
              • Easy to understand

              Cons

                Best Uses

                • Expert
                • Intermediate
                • Novice
                • Student

                Comments about oreilly Data Science at the Command Line:

                I love using command lines tool like curl, any tips and tricks are priceless. the command line tools out there are fabulous, and this book can only add to the data scientists repertoire.
                the scrips are reusable, take minutes to construct, and easy for others in the project to read understand and use.

                (2 of 2 customers found this review helpful)

                 
                5.0

                Review of the completed book

                By Vijay N Phadke

                from Fremont, CA

                About Me Developer

                Verified Reviewer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Expert
                  • Intermediate

                  Comments about oreilly Data Science at the Command Line:

                  Recently I got an updated copy of the book which is now almost a finished product except a couple of chapters.
                  The book now has excellent content throughout covering all aspects of data manipulation solely using command line. I found chapter 8 Parallel Pipelines quite interesting which shows how to do distributed data processing on remote machines employing gnu parallel, almost a la Hadoop but without the complexity! This could be real benefit to tasks that just don't have the luxury of using Hadoop.
                  The book now deserves 5 stars for making data wrangling on command line cool again!

                  (2 of 2 customers found this review helpful)

                   
                  5.0

                  This is how I work

                  By David Huttleston Jr

                  from Madison WI

                  About Me Developer

                  Verified Buyer

                  Pros

                  • Accurate
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Expert
                    • Intermediate

                    Comments about oreilly Data Science at the Command Line:

                    This is fabulous guide to how get Data Science *done*. The commandline is intensely productive for data work-flows. And, data science is all about work-flow. There is almost nothing that is done with data that doesn't have to be done many times. I personally use many of these techniques everyday and it's delightful to see someone else's take on things. I've already picked up many new tricks and tools.
                    Many thanks to the author!

                    Displaying reviews 1-8

                    Back to top

                     
                    Buy 2 Get 1 Free Free Shipping Guarantee
                    Buying Options
                    Immediate Access - Go Digital what's this?
                    Ebook: $33.99
                    Formats:  DAISY, ePub, Mobi, PDF
                    Print & Ebook: $43.99
                    Print: $39.99