Book description
Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes
In Detail
This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.
You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.
What You Will Learn
- Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
- Implement practical tree and graph algorithms on various datasets
- Apply statistical methods such as moving average and linear regression to understand patterns
- Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms
- Find clusters in data using some of the most popular machine learning algorithms
- Manage results by visualizing or exporting data
Table of contents
-
Haskell Data Analysis Cookbook
- Table of Contents
- Haskell Data Analysis Cookbook
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
-
1. The Hunt for Data
- Introduction
- Harnessing data from various sources
- Accumulating text data from a file path
- Catching I/O code faults
- Keeping and representing data from a CSV file
- Examining a JSON file with the aeson package
- Reading an XML file using the HXT package
- Capturing table rows from an HTML page
- Understanding how to perform HTTP GET requests
- Learning how to perform HTTP POST requests
- Traversing online directories for data
- Using MongoDB queries in Haskell
- Reading from a remote MongoDB server
- Exploring data from a SQLite database
-
2. Integrity and Inspection
- Introduction
- Trimming excess whitespace
- Ignoring punctuation and specific characters
- Coping with unexpected or missing input
- Validating records by matching regular expressions
- Lexing and parsing an e-mail address
- Deduplication of nonconflicting data items
- Deduplication of conflicting data items
- Implementing a frequency table using Data.List
- Implementing a frequency table using Data.MultiSet
- Computing the Manhattan distance
- Computing the Euclidean distance
- Comparing scaled data using the Pearson correlation coefficient
- Comparing sparse data using cosine similarity
-
3. The Science of Words
- Introduction
- Displaying a number in another base
- Reading a number from another base
- Searching for a substring using Data.ByteString
- Searching a string using the Boyer-Moore-Horspool algorithm
- Searching a string using the Rabin-Karp algorithm
- Splitting a string on lines, words, or arbitrary tokens
- Finding the longest common subsequence
- Computing a phonetic code
- Computing the edit distance
- Computing the Jaro-Winkler distance between two strings
- Finding strings within one-edit distance
- Fixing spelling mistakes
-
4. Data Hashing
- Introduction
- Hashing a primitive data type
- Hashing a custom data type
- Running popular cryptographic hash functions
- Running a cryptographic checksum on a file
- Performing fast comparisons between data types
- Using a high-performance hash table
- Using Google's CityHash hash functions for strings
- Computing a Geohash for location coordinates
- Using a bloom filter to remove unique items
- Running MurmurHash, a simple but speedy hashing algorithm
- Measuring image similarity with perceptual hashes
-
5. The Dance with Trees
- Introduction
- Defining a binary tree data type
- Defining a rose tree (multiway tree) data type
- Traversing a tree depth-first
- Traversing a tree breadth-first
- Implementing a Foldable instance for a tree
- Calculating the height of a tree
- Implementing a binary search tree data structure
- Verifying the order property of a binary search tree
- Using a self-balancing tree
- Implementing a min-heap data structure
- Encoding a string using a Huffman tree
- Decoding a Huffman code
-
6. Graph Fundamentals
- Introduction
- Representing a graph from a list of edges
- Representing a graph from an adjacency list
- Conducting a topological sort on a graph
- Traversing a graph depth-first
- Traversing a graph breadth-first
- Visualizing a graph using Graphviz
- Using Directed Acyclic Word Graphs
- Working with hexagonal and square grid networks
- Finding maximal cliques in a graph
- Determining whether any two graphs are isomorphic
-
7. Statistics and Analysis
- Introduction
- Calculating a moving average
- Calculating a moving median
- Approximating a linear regression
- Approximating a quadratic regression
- Obtaining the covariance matrix from samples
- Finding all unique pairings in a list
- Using the Pearson correlation coefficient
- Evaluating a Bayesian network
- Creating a data structure for playing cards
- Using a Markov chain to generate text
- Creating n-grams from a list
- Creating a neural network perceptron
-
8. Clustering and Classification
- Introduction
- Implementing the k-means clustering algorithm
- Implementing hierarchical clustering
- Using a hierarchical clustering library
- Finding the number of clusters
- Clustering words by their lexemes
- Classifying the parts of speech of words
- Identifying key words in a corpus of text
- Training a parts-of-speech tagger
- Implementing a decision tree classifier
- Implementing a k-Nearest Neighbors classifier
- Visualizing points using Graphics.EasyPlot
-
9. Parallel and Concurrent Design
- Introduction
- Using the Haskell Runtime System options
- Evaluating a procedure in parallel
- Controlling parallel algorithms in sequence
- Forking I/O actions for concurrency
- Communicating with a forked I/O action
- Killing forked threads
- Parallelizing pure functions using the Par monad
- Mapping over a list in parallel
- Accessing tuple elements in parallel
- Implementing MapReduce to count word frequencies
- Manipulating images in parallel using Repa
- Benchmarking runtime performance in Haskell
- Using the criterion package to measure performance
- Benchmarking runtime performance in the terminal
-
10. Real-time Data
- Introduction
- Streaming Twitter for real-time sentiment analysis
- Reading IRC chat room messages
- Responding to IRC messages
- Polling a web server for latest updates
- Detecting real-time file directory changes
- Communicating in real time through sockets
- Detecting faces and eyes through a camera stream
- Streaming camera frames for template matching
-
11. Visualizing Data
- Introduction
- Plotting a line chart using Google's Chart API
- Plotting a pie chart using Google's Chart API
- Plotting bar graphs using Google's Chart API
- Displaying a line graph using gnuplot
- Displaying a scatter plot of two-dimensional points
- Interacting with points in a three-dimensional space
- Visualizing a graph network
- Customizing the looks of a graph network diagram
- Rendering a bar graph in JavaScript using D3.js
- Rendering a scatter plot in JavaScript using D3.js
- Diagramming a path from a list of vectors
- 12. Exporting and Presenting
- Index
Product information
- Title: Haskell Data Analysis Cookbook
- Author(s):
- Release date: June 2014
- Publisher(s): Packt Publishing
- ISBN: 9781783286331
You might also like
book
Getting Started with Haskell Data Analysis
Put your Haskell skills to work and generate publication-ready visualizations in no time at all Key …
book
Haskell Cookbook
Save time and build fast, functional, and concurrent application using Haskell About This Book Comprehensive guide …
book
Haskell High Performance Programming
Boost the performance of your Haskell applications using optimization, concurrency, and parallel programming About This Book …
book
Beginning Haskell: A Project-Based Approach
Beginning Haskell provides a broad-based introduction to the Haskell language, its libraries and environment, and to …