With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.
Up until recently, Hadoop deployments have existed on hardware owned and run by organizations. Now cloud service providers let customers effectively rent hardware and the associated network connectivity. But there’s a lot more to installing a Hadoop cluster in the public cloud than simply renting machines.
This practical book shows developers and systems administrators familiar with Hadoop how to install, use, and manage cloud-born clusters efficiently. You’ll learn how to architect clusters in a way that works with cloud-provider features—not just to avoid potential pitfalls, but also to take full advantage of what these services can do. You’ll also compare the Amazon, Google, and Microsoft clouds, and learn how to set up clusters in each of them.
Learn the advantages and disadvantages of running Hadoop in the cloud
Get a cloud primer on instances, networking and security, and storage
Build a simple Hadoop cloud cluster, and run a MapReduce job
Explore use cases for high availability, relational data with Hive, and complex analytics with Spark
Learn best practices for designing and managing cloud clusters, including network topologies, day-to-day tasks, and troubleshooting
Chapter 1Why Hadoop in the Cloud?
Chapter 2Overview and Comparison of Cloud Providers
Chapter 4Networking and Security
Chapter 6Setting Up in AWS
Chapter 7Setting Up in Google Cloud Platform
Chapter 8Setting Up in Azure
Chapter 9Standing Up a Cluster
Chapter 10High Availability
Chapter 11Relational Data with Apache Hive
Chapter 12Streaming in the Cloud with Apache Spark
Bill Havanki is a software engineer working for Cloudera, where he has contributed to Hadoop components as well as systems for deploying Hadoop clusters into public Cloud services. Prior to joining Cloudera he worked for 15 years developing software for government contracts, focusing mostly on analytic frameworks and authentication and authorization systems. He earned his B.S. in Electrical Engineering from Rutgers University and his M.S. in Computer Engineering from North Carolina State University. A New Jersey native, he currently lives near Annapolis, Maryland with his family.