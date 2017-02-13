With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You’ll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.
Tackle a variety of tasks in natural language processing by learning how to use the R language and tidy data principles. This practical guide provides examples and resources to help you get up to speed with dplyr, broom, ggplot2, and other tidy tools from the R ecosystem. You’ll discover how tidy data principles can make text mining easier, more effective, and consistent by employing tools already in wide use.
Text Mining with R shows you how to manipulate, summarize, and visualize the characteristics of text, sentiment analysis, tf-idf, and topic modeling. Along with tidy data methods, you’ll also examine several beginning-to-end tidy text analyses on data sources from Twitter to NASA datasets. These analyses bring together multiple text mining approaches covered in the book.
Get real-world examples for implementing text mining using tidy R package
Understand natural language processing concepts like sentiment analysis, tf-idf, and topic modeling
Learn how to analyze unstructured, text-heavy data using R language and ecosystem
Chapter 1Introduction
Chapter 2The tidy text format
Chapter 3Sentiment analysis with tidy data
Chapter 4Analyzing word and document frequency: tf-idf
Chapter 5Working with combinations of words using n-grams and widyr
Chapter 6Tidying and casting document-term matrices
Julia Silge is a data scientist at Datassist where her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences. She has a PhD in Astrophysics, as well as abiding affections for Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering R.
David Robinson is a data scientist at Stack Overflow. He has a PhD in Quantitative and Computational Biology from Princeton University, where he worked with Professor John Storey on genomic analysis. He enjoys working and blogging about statistics, R programming, and text mining, including a popular analysis of Donald Trump’s twitter account (performed according to the tidy data principles described in this book).