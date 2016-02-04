Search Inside and Read Larger Cover Data Wrangling with Python Tips and Tools to Make Your Life Easier By Publisher: O'Reilly Media Final Release Date: February 2016 Pages: 508 How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on guide shows non-programmers like you how to process information that’s initially too messy or difficult to access. You don't need to know a thing about the Python programming language to get started. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently. You’ll also discover how to automate your data process, schedule file- editing and clean-up tasks, process larger datasets, and create compelling stories with data you obtain. Quickly learn basic Python syntax, data types, and language concepts

Work with both machine-readable and human-consumable data

Scrape websites and APIs to find a bounty of useful information

Clean and format data to eliminate duplicates and errors in your datasets

Learn when to standardize data and when to test and script data cleanup

Explore and analyze your datasets with new Python libraries and techniques

Chapter 1 Introduction to Python Why Python Getting Started with Python Summary Chapter 2 Python Basics Basic Data Types Data Containers What Can the Various Data Types Do? Helpful Tools: type, dir, and help Putting It All Together What Does It All Mean? Summary Chapter 3 Data Meant to Be Read by Machines CSV Data JSON Data XML Data Summary Chapter 4 Working with Excel Files Installing Python Packages Parsing Excel Files Getting Started with Parsing Summary Chapter 5 PDFs and Problem Solving in Python Avoid Using PDFs! Programmatic Approaches to PDF Parsing Parsing PDFs Using pdfminer Learning How to Solve Problems Uncommon File Types Summary Chapter 6 Acquiring and Storing Data Not All Data Is Created Equal Fact Checking Readability, Cleanliness, and Longevity Where to Find Data Case Studies: Example Data Investigation Storing Your Data: When, Why, and How? Databases: A Brief Introduction When to Use a Simple File Alternative Data Storage Summary Chapter 7 Data Cleanup: Investigation, Matching, and Formatting Why Clean Data? Data Cleanup Basics Summary Chapter 8 Data Cleanup: Standardizing and Scripting Normalizing and Standardizing Your Data Saving Your Data Determining What Data Cleanup Is Right for Your Project Scripting Your Cleanup Testing with New Data Summary Chapter 9 Data Exploration and Analysis Exploring Your Data Analyzing Your Data Summary Chapter 10 Presenting Your Data Avoiding Storytelling Pitfalls Visualizing Your Data Presentation Tools Publishing Your Data Summary Chapter 11 Web Scraping: Acquiring and Storing Data from the Web What to Scrape and How Analyzing a Web Page Getting Pages: How to Request on the Internet Reading a Web Page with Beautiful Soup Reading a Web Page with LXML Summary Chapter 12 Advanced Web Scraping: Screen Scrapers and Spiders Browser-Based Parsing Spidering the Web Networks: How the Internet Works and Why It's Breaking Your Script The Changing Web (or Why Your Script Broke) A (Few) Word(s) of Caution Summary Chapter 13 APIs API Features A Simple Data Pull from Twitter's REST API Advanced Data Collection from Twitter's REST API Advanced Data Collection from Twitter's Streaming API Summary Chapter 14 Automation and Scaling Why Automate? Steps to Automate What Could Go Wrong? Where to Automate Special Tools for Automation Simple Automation Large-Scale Automation Monitoring Your Automation No System Is Foolproof Summary Chapter 15 Conclusion Duties of a Data Wrangler Beyond Data Wrangling Where Do You Go from Here? Appendix Comparison of Languages Mentioned C, C++, and Java Versus Python R or MATLAB Versus Python HTML Versus Python JavaScript Versus Python Node.js Versus Python Ruby and Ruby on Rails Versus Python Appendix Python Resources for Beginners Online Resources In-Person Groups Appendix Learning the Command Line Bash Windows CMD/Power Shell Appendix Advanced Python Setup Step 1: Install GCC Step 2: (Mac Only) Install Homebrew Step 3: (Mac Only) Tell Your System Where to Find Homebrew Step 4: Install Python 2.7 Step 5: Install virtualenv (Windows, Mac, Linux) Step 6: Set Up a New Directory Step 7: Install virtualenvwrapper Learning About Our New Environment (Windows, Mac, Linux) Advanced Setup Review Appendix Python Gotchas Hail the Whitespace The Dreaded GIL = Versus == Versus is, and When to Just Copy Default Function Arguments Python Scope and Built-Ins: The Importance of Variable Names Defining Objects Versus Modifying Objects Changing Immutable Objects Type Checking Catching Multiple Exceptions The Power of Debugging Appendix IPython Hints Why Use IPython? Getting Started with IPython Magic Functions Final Thoughts: A Simpler Terminal Appendix Using Amazon Web Services Spinning Up an AWS Server Logging into an AWS Server

Title: Data Wrangling with Python
By: Jacqueline Kazil, Katharine Jarmul
Publisher: O'Reilly Media
Formats: Print

Pages: 508
Print ISBN: 978-1-4919-4881-1 | ISBN 10: 1-4919-4881-7
Ebook ISBN: 978-1-4919-4876-7 | ISBN 10: 1-4919-4876-0

Jacqueline Kazil
Jacqueline Kazil is a data lover. In her career, she has worked in technology focusing in finance, government, and journalism. Most notably, she is a former Presidential Innovation Fellow and co-founded a technology organization in government called 18F. Her career has consisted of many data science and wrangling projects including Geoq, an open source mapping workflow tool, Congress.gov remake, and Top Secret America. She is active in the Python and data related communities -- Python Software Foundation, PyLadies, Women Data Science DC, and more. She teaches Python in Washington, D.C. at meetups, conferences, and mini bootcamps. She often pair programs with her sidekick, Ellie (@ellie_the_brave). You can find her on Twitter @jackiekazil or follow her blog, The coderSnorts (https://medium.com/coder-snorts).

Katharine Jarmul
Katharine Jarmul is a Python developer who enjoys data analysis and acquisition, web scraping, teaching Python and all things Unix. She has worked at small and large start ups before starting her consulting career overseas. Originally from Los Angeles, she learned Python while working at the Washington Post in 2008. As one of the founders of PyLadies (http://pyladies.org/), Katharine hopes to promote diversity in Python and other open source languages through education and training. She has led numerous workshops and tutorials ranging from beginner to advanced topics in Python. For more information on upcoming trainings, reach out to her on Twitter (http://twitter.com/kjam) or her her web site (http://kjamistan.com/).

Colophon
The animal on the cover of Data Wrangling with Python is a blue-lipped tree lizard (Plica umbra). Members of the Plica genus are of moderate size and, though they belong to a family commonly known as neotropical ground lizards, live mainly in trees in South America and the Caribbean. Blue-lipped tree lizards predominantly consume ants and are the only species in their genus not characterized by bunches of spines on the neck.

Many of the animals on O'Reilly covers are endangered; all of them are important to the world. To learn more about how you can help, go to animals.oreilly.com .

The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag's Ubuntu Mono.