Books & Videos

Table of Contents

  1. Chapter 1 Introduction

    1. Overview

    2. Data Science Is OSEMN

    3. Intermezzo Chapters

    4. What Is the Command Line?

    5. Why Data Science at the Command Line?

    6. A Real-World Use Case

    7. Further Reading

  2. Chapter 2 Getting Started

    1. Overview

    2. Setting Up Your Data Science Toolbox

    3. Essential Concepts and Tools

    4. Further Reading

  3. Chapter 3 Obtaining Data

    1. Overview

    2. Copying Local Files to the Data Science Toolbox

    3. Decompressing Files

    4. Converting Microsoft Excel Spreadsheets

    5. Querying Relational Databases

    6. Downloading from the Internet

    7. Calling Web APIs

    8. Further Reading

  4. Chapter 4 Creating Reusable Command-Line Tools

    1. Overview

    2. Converting One-Liners into Shell Scripts

    3. Creating Command-Line Tools with Python and R

    4. Further Reading

  5. Chapter 5 Scrubbing Data

    1. Overview

    2. Common Scrub Operations for Plain Text

    3. Working with CSV

    4. Working with HTML/XML and JSON

    5. Common Scrub Operations for CSV

    6. Further Reading

  6. Chapter 6 Managing Your Data Workflow

    1. Overview

    2. Introducing Drake

    3. Installing Drake

    4. Obtain Top Ebooks from Project Gutenberg

    5. Every Workflow Starts with a Single Step

    6. Well, That Depends

    7. Rebuilding Specific Targets

    8. Discussion

    9. Further Reading

  7. Chapter 7 Exploring Data

    1. Overview

    2. Inspecting Data and Its Properties

    3. Computing Descriptive Statistics

    4. Creating Visualizations

    5. Further Reading

  8. Chapter 8 Parallel Pipelines

    1. Overview

    2. Serial Processing

    3. Parallel Processing

    4. Distributed Processing

    5. Discussion

    6. Further Reading

  9. Chapter 9 Modeling Data

    1. Overview

    2. More Wine, Please!

    3. Dimensionality Reduction with Tapkee

    4. Clustering with Weka

    5. Regression with SciKit-Learn Laboratory

    6. Classification with BigML

    7. Further Reading

  10. Chapter 10 Conclusion

    1. Let’s Recap

    2. Three Pieces of Advice

    3. Where to Go from Here?

    4. Getting in Touch

  11. Appendix List of Command-Line Tools

    1. alias

    2. awk

    3. aws

    4. bash

    5. bc

    6. bigmler

    7. body

    8. cat

    9. cd

    10. chmod

    11. cols

    12. cowsay

    13. cp

    14. csvcut

    15. csvgrep

    16. csvjoin

    17. csvlook

    18. csvsort

    19. csvsql

    20. csvstack

    21. csvstat

    22. curl

    23. curlicue

    24. cut

    25. display

    26. drake

    27. dseq

    28. echo

    29. env

    30. export

    31. feedgnuplot

    32. fieldsplit

    33. find

    34. for

    35. git

    36. grep

    37. head

    38. header

    39. in2csv

    40. jq

    41. json2csv

    42. less

    43. ls

    44. man

    45. mkdir

    46. mv

    47. parallel

    48. paste

    49. pbc

    50. pip

    51. pwd

    52. python

    53. R

    54. Rio

    55. Rio-scatter

    56. rm

    57. run_experiment

    58. sample

    59. scp

    60. scrape

    61. sed

    62. seq

    63. shuf

    64. sort

    65. split

    66. sql2csv

    67. ssh

    68. sudo

    69. tail

    70. tapkee

    71. tar

    72. tee

    73. tr

    74. tree

    75. type

    76. uniq

    77. unpack

    78. unrar

    79. unzip

    80. wc

    81. weka

    82. which

    83. xml2json

  12. Appendix Bibliography