Data Science at the Command Line
Facing the Future with Time-Tested Tools
Publisher: O'Reilly Media
Final Release Date: September 2014
Pages: 212

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on plain text, CSV, HTML/XML, and JSON
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow using Drake
  • Create reusable tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines using GNU Parallel
  • Model data with dimensionality reduction, clustering, regression, and classification algorithms
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Science at the Command Line
 
4.5

(based on 11 reviews)

Ratings Distribution

  • 5 Stars

     

    (6)

  • 4 Stars

     

    (5)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (10)
  • Easy to understand (9)
  • Well-written (9)
  • Concise (8)
  • Accurate (7)

Cons

    Best Uses

    • Intermediate (11)
    • Expert (5)
    • Novice (5)
      • Reviewer Profile:
      • Developer (6)

    Reviewed by 11 customers

    Sort by

    Displaying reviews 1-10

    Back to top

    Previous | Next »

     
    4.0

    Whao! so much tools on the same terminal

    By searchs

    from London, UK

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice
      • Student

      Comments about oreilly Data Science at the Command Line:

      I now wasily process JSON, perform counts and squeeze more information using the commandline without having to go online to search for a gem or plugin or another IDE

      (1 of 1 customers found this review helpful)

       
      5.0

      A must-have for data scientists

      By Christophe Lalanne

      from Paris, France

      About Me Statistician

      Verified Buyer

      Pros

      • Concise
      • Easy to understand
      • Helpful examples

      Cons

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Science at the Command Line:

        John M. Chambers advocated long ago that many operations on text files can be handled with programs like Sed, Awk, or Perl, before feeding a statistical package with data. This book goes beyond by showing with lot of illustrations how to use GNU software and modern tools to retrieve and preprocess data, and how to build efficient workflow for data analysis in various settings.

        (1 of 1 customers found this review helpful)

         
        4.0

        Quick and useful

        By Gabe

        from Chicago, IL

        About Me Scientist

        Verified Reviewer

        Pros

        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate

          Comments about oreilly Data Science at the Command Line:

          This is a quick and useful tour of command line utilities for munging and visualizing data. All of the information is available free online if you're willing to scour for it, but this book puts a lot of nice code all in one place.

          (1 of 1 customers found this review helpful)

           
          4.0

          great book

          By rvjansen

          from Amsterdam

          About Me Designer, Developer, Maker, Sys Admin

          Verified Buyer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate
            • Novice

            Comments about oreilly Data Science at the Command Line:

            I like everything in this book, as it shows how the basic unix philosophy of chaining pipes of text is the most valuable paradigm still if any reasonable amount of work needs to be done.

            I have a problem with the PDF in Mac Preview, though. This might be a bug in Yosemite Preview, but this is still the only file that triggers it. The front picture distorts, and my laptop crawls after that, with spinning beachballs. Maybe someone ought to review the technical merits of this PDF. Please inform me if there is a new version.

            (1 of 1 customers found this review helpful)

             
            5.0

            Really helps to change our mindset

            By Seb Portebois

            from Montréal, QC, Canada

            About Me Developer

            Verified Buyer

            Pros

            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Intermediate

              Comments about oreilly Data Science at the Command Line:

              I am not a data-scientist, but I do a lot of data treatment, and already used the command line for that.
              Data Science at the Command Line really helped my to go ahead one big step further and helped my improve my skillset and change my mindset.

              (2 of 3 customers found this review helpful)

               
              5.0

              Excellent

              By Steven Pennebaker

              from Soquel, CA

              About Me Developer

              Verified Reviewer

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Expert
                • Intermediate
                • Novice

                Comments about oreilly Data Science at the Command Line:

                This is an excellent book. Thorough and clear, it has enough basic information for beginners but even intermediate and advanced users will pick up plenty of new tricks. When I've had to solve these types of problems in the past, I've leaned pretty heavily on AWK and, to a lesser extent, XSL (!). This book introduced me to a bunch of utilities that were new to me and reminded me of a few old friends I haven't used in years.

                (1 of 1 customers found this review helpful)

                 
                5.0

                Unleash the power of the command line

                By Brian

                from Arlington, VA

                About Me Data Scientist

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Intermediate
                  • Novice

                  Comments about oreilly Data Science at the Command Line:

                  As a data scientist new to the field, this book was an invaluable resource in grasping some really important skills I had seen but hadn't used. Jeroen is clear, concise, and has given me a new perspective on my field. A must read.

                  (1 of 1 customers found this review helpful)

                   
                  4.0

                  First of its kind

                  By Anthony Georgilas

                  from Kalamata, Hellas

                  Verified Reviewer

                  Pros

                  • Accurate
                  • Concise
                  • Easy to understand
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Intermediate

                    Comments about oreilly Data Science at the Command Line:

                    As far as I know, most books on Data Analysis are focused on using R or similar statistical software. I think it was about time for a different approach. One can do more work done at the command line in less time. Thanks to Mr. Janssens, we have a reference to most of the useful tools for data analysis out there. I'm not an expert Terminal user (I've been using graphical OSes since the early nineties) and, to be honest, I didn't know how much power is hidden in it. My only complaint is that, for the time being, VirtualBox is needed for easy installation of all the tools mentioned in the book. As a Mac user, it would be nice to have a straightforward way to install the tools, but I'm sure Jeroen will fix this.

                    That said, I would like to express my appreciation to the author.

                    (1 of 1 customers found this review helpful)

                     
                    4.0

                    Fast and powerful,

                    By Clive

                    from Johannesburg, RSA

                    About Me Developer, Qa

                    Verified Buyer

                    Pros

                    • Concise
                    • Easy to understand

                    Cons

                      Best Uses

                      • Expert
                      • Intermediate
                      • Novice
                      • Student

                      Comments about oreilly Data Science at the Command Line:

                      I love using command lines tool like curl, any tips and tricks are priceless. the command line tools out there are fabulous, and this book can only add to the data scientists repertoire.
                      the scrips are reusable, take minutes to construct, and easy for others in the project to read understand and use.

                      (3 of 3 customers found this review helpful)

                       
                      5.0

                      Review of the completed book

                      By Vijay N Phadke

                      from Fremont, CA

                      About Me Developer

                      Verified Reviewer

                      Pros

                      • Accurate
                      • Concise
                      • Easy to understand
                      • Helpful examples
                      • Well-written

                      Cons

                        Best Uses

                        • Expert
                        • Intermediate

                        Comments about oreilly Data Science at the Command Line:

                        Recently I got an updated copy of the book which is now almost a finished product except a couple of chapters.
                        The book now has excellent content throughout covering all aspects of data manipulation solely using command line. I found chapter 8 Parallel Pipelines quite interesting which shows how to do distributed data processing on remote machines employing gnu parallel, almost a la Hadoop but without the complexity! This could be real benefit to tasks that just don't have the luxury of using Hadoop.
                        The book now deserves 5 stars for making data wrangling on command line cool again!

                        Displaying reviews 1-10

                        Back to top

                        Previous | Next »

                         
                        Buy 2 Get 1 Free Free Shipping Guarantee
                        Buying Options
                        Immediate Access - Go Digital what's this?
                        Ebook: $33.99
                        Formats:  DAISY, ePub, Mobi, PDF
                        Print & Ebook: $43.99
                        Print: $39.99