Data Science Starter Kit
The Tools You Need to Get Started with Data
"The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing and retail to healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data. 'Data Scientist' is now the hottest job title in Silicon Valley."
– Tim O'Reilly
From basic statistics to machine learning and new ways to think about visualization, the Data Science Starter Kit gives you the tools you need to get started with data. If you haven't yet taken the leap, why wait? And if you're already experienced with data, the Starter Kit will push you further. The package includes (13) titles on R, data analysis, Python, machine learning, and visualization.
This kit includes everything you need, from analysis and visualization to management.
Buy any two titles and get the 3rd Free with discount code: OPC10.
Or, buy them all for just $169.99 ($288 savings)
Data Science for Business: is based on an MBA course Provost has taught at New York University over the past ten years; this book provides examples of real-world business problems to illustrate these principles. You'll not only learn how to improve communication between business stakeholders and data scientists, but also how to participate intelligently in your company's data science projects.
Doing Data Science: This insightful book, based on Columbia University's Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you're familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
Agile Data Science: Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You'll learn an iterative approach that enables you to quickly change the kind of analysis you're doing, depending on what the data is telling you.
Bad Data Handbook: What is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they've recovered from nasty data problems.
Data Analysis with Open Source Tools: A survey of data analysis from a practitioner – from histograms to machine learning,this book presents the tools you need to make sense with data. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Python for Data Analysis: is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
Machine Learning for Hackers: If you're an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation.
Mining the Social Web, 2nd Edition: How can you tap into the wealth of social web data to discover who's making connections with whom, what they're talking about, and where they're located? With this expanded and thoroughly revised edition, you'll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.
R Cookbook: Over 200 recipes for R users, ranging from the basic to the esoteric. Why re-invent the wheel? This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.
R in a Nutshell, 2nd Edition: The authoritative guide to what's become the de-facto standard for statistical programming. R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment.
MapReduce Design Patterns: Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.
Feedback Control for Computer Systems: author Philipp K. Janert demonstrates how the same principles that govern cruise control in your car also apply to data center management and other enterprise systems. Through case studies and hands-on simulations, you'll learn methods to solve several control issues, including mechanisms to spin up more servers automatically when web traffic spikes.