Recording data and later analyzing it isn't just for the entrepreneurial spirit, since it does result in some amazing discoveries. In business and government, organizations leveraging data-based decisions are more successful than those relying on decades of trial-and-error. To make these data based decisions, the data needs to be collected, processed and distilled in a manner humans can understand. It is the IT department that is responsible for connecting these data pipes and separating the signal from the noise. Those involved in data wrangling would have easier jobs if the processing power of a single machine could keep up with the size of the data they needed to process. The fact is it hasn't, and over the past decade a new field of big data has emerged. This big data field is focused on reliably processing massive amounts of data. Processing large amounts of data has been done in the past, but what is new this time around is that it's being done on commodity hardware and open-source software tools.
We have chosen a selection of the most useful books for data analysis in this bibliography. These start from high level concepts of business intelligence, data analysis and data mining. The list then works its way down to the tools needed for number crunching mathematical toolkits, machine learning, and natural language processing. In order to implement these concepts beyond the "toy" stage, infrastructure tools are covered in Cloud Services and Infrastructure and Amazon Web Services. Finally, big data tools that can be deployed in the cloud or locally are covered in the Hadoop, and NoSqL Data Stores section.