Python Natural Language Processing

Book description

Leverage the power of machine learning and deep learning to extract information from text data

About This Book

  • Implement Machine Learning and Deep Learning techniques for efficient natural language processing
  • Get started with NLTK and implement NLP in your applications with ease
  • Understand and interpret human languages with the power of text analysis via Python

Who This Book Is For

This book is intended for Python developers who wish to start with natural language processing and want to make their applications smarter by implementing NLP in them.

What You Will Learn

  • Focus on Python programming paradigms, which are used to develop NLP applications
  • Understand corpus analysis and different types of data attribute.
  • Learn NLP using Python libraries such as NLTK, Polyglot, SpaCy, Standford CoreNLP and so on
  • Learn about Features Extraction and Feature selection as part of Features Engineering.
  • Explore the advantages of vectorization in Deep Learning.
  • Get a better understanding of the architecture of a rule-based system.
  • Optimize and fine-tune Supervised and Unsupervised Machine Learning algorithms for NLP problems.
  • Identify Deep Learning techniques for Natural Language Processing and Natural Language Generation problems.

In Detail

This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them.

During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis.

You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data.

By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.

Style and approach

This book teaches the readers various aspects of natural language Processing using NLTK. It takes the reader from the basic to advance level in a smooth way.

Table of contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introduction
    1. Understanding natural language processing
    2. Understanding basic applications
      1. Understanding advanced applications
    3. Advantages of togetherness - NLP and Python
    4. Environment setup for NLTK
    5. Tips for readers
    6. Summary
  3. Practical Understanding of a Corpus and Dataset
    1. What is a corpus?
    2. Why do we need a corpus?
    3. Understanding corpus analysis
      1. Exercise
    4. Understanding types of data attributes
      1. Categorical or qualitative data attributes
      2. Numeric or quantitative data attributes
    5. Exploring different file formats for corpora
    6. Resources for accessing free corpora
    7. Preparing a dataset for NLP applications
      1. Selecting data
      2. Preprocessing the dataset
        1. Formatting
        2. Cleaning
        3. Sampling
        4. Transforming data
    8. Web scraping
    9. Summary
  4. Understanding the Structure of a Sentences
    1. Understanding components of NLP
    2. Natural language understanding
      1. Natural language generation
      2. Differences between NLU and NLG
      3. Branches of NLP
    3. Defining context-free grammar
      1. Exercise
    4. Morphological analysis
      1. What is morphology?
      2. What are morphemes?
      3. What is a stem?
      4. What is morphological analysis?
      5. What is a word?
      6. Classification of morphemes
        1. Free morphemes
        2. Bound morphemes
          1. Derivational morphemes
          2. Inflectional morphemes
      7. What is the difference between a stem and a root?
      8. Exercise
      9. Lexical analysis
      10. What is a token?
      11. What are part of speech tags?
      12. Process of deriving tokens
      13. Difference between stemming and lemmatization
      14. Applications
    5. Syntactic analysis
      1. What is syntactic analysis?
    6. Semantic analysis
      1. What is semantic analysis?
      2. Lexical semantics
      3. Hyponymy and hyponyms
        1. Homonymy
        2. Polysemy
      4. What is the difference between polysemy and homonymy?
      5. Application of semantic analysis
    7. Handling ambiguity
      1. Lexical ambiguity
      2. Syntactic ambiguity
        1. Approach to handle syntactic ambiguity
      3. Semantic ambiguity
      4. Pragmatic ambiguity
    8. Discourse integration
      1. Applications
    9. Pragmatic analysis
    10. Summary
  5. Preprocessing
    1. Handling corpus-raw text
      1. Getting raw text
      2. Lowercase conversion
      3. Sentence tokenization
        1. Challenges of sentence tokenization
      4. Stemming for raw text
        1. Challenges of stemming for raw text
      5. Lemmatization of raw text
        1. Challenges of lemmatization of raw text
      6. Stop word removal
      7. Exercise
    2. Handling corpus-raw sentences
      1. Word tokenization
        1. Challenges for word tokenization
      2. Word lemmatization
        1. Challenges for word lemmatization
    3. Basic preprocessing
      1. Regular expressions
        1. Basic level regular expression
        2. Basic flags
        3. Advanced level regular expression
          1. Positive lookahead
          2. Positive lookbehind
          3. Negative lookahead
          4. Negative lookbehind
    4. Practical and customized preprocessing
      1. Decide by yourself
      2. Is preprocessing required?
      3. What kind of preprocessing is required?
      4. Understanding case studies of preprocessing
        1. Grammar correction system
        2. Sentiment analysis
        3. Machine translation
        4. Spelling correction
          1. Approach
    5. Summary
  6. Feature Engineering and NLP Algorithms
    1. Understanding feature engineering
      1. What is feature engineering?
      2. What is the purpose of feature engineering?
      3. Challenges
    2. Basic feature of NLP
      1. Parsers and parsing
        1. Understanding the basics of parsers
        2. Understanding the concept of parsing
        3. Developing a parser from scratch
        4. Types of grammar
          1. Context-free grammar
          2. Probabilistic context-free grammar
        5. Calculating the probability of a tree
        6. Calculating the probability of a string
        7. Grammar transformation
        8. Developing a parser with the Cocke-Kasami-Younger Algorithm
        9. Developing parsers step-by-step
        10. Existing parser tools
          1. The Stanford parser
          2. The spaCy parser
          3. Extracting and understanding the features
        11. Customizing parser tools
        12. Challenges
      2. POS tagging and POS taggers
        1. Understanding the concept of POS tagging and POS taggers
        2. Developing POS taggers step-by-step
        3. Plug and play with existing POS taggers
          1. A Stanford POS tagger example
          2. Using polyglot to generate POS tagging
      3. Exercise
        1. Using POS tags as features
        2. Challenges
      4. Name entity recognition
        1. Classes of NER
        2. Plug and play with existing NER tools
          1. A Stanford NER example
          2. A Spacy NER example
        3. Extracting and understanding the features
        4. Challenges
      5. n-grams
        1. Understanding n-gram using a practice example
        2. Application
      6. Bag of words
        1. Understanding BOW
        2. Understanding BOW using a practical example
        3. Comparing n-grams and BOW
        4. Applications
      7. Semantic tools and resources
    3. Basic statistical features for NLP
      1. Basic mathematics
      2. Basic concepts of linear algebra for NLP
      3. Basic concepts of the probabilistic theory for NLP
        1. Probability
          1. Independent event and dependent event
        2. Conditional probability
      4. TF-IDF
        1. Understanding TF-IDF
        2. Understanding TF-IDF with a practical example
          1. Using textblob
          2. Using scikit-learn
        3. Application
      5. Vectorization
      6. Encoders and decoders
        1. One-hot encoding
        2. Understanding a practical example for one-hot encoding
        3. Application
      7. Normalization
        1. The linguistics aspect of normalization
        2. The statistical aspect of normalization
      8. Probabilistic models
        1. Understanding probabilistic language modeling
        2. Application of LM
      9. Indexing
        1. Application
      10. Ranking
    4. Advantages of features engineering
    5. Challenges of features engineering
    6. Summary
  7. Advanced Feature Engineering and NLP Algorithms
    1. Recall word embedding
    2. Understanding the basics of word2vec
      1. Distributional semantics
      2. Defining word2vec
      3. Necessity of unsupervised distribution semantic model - word2vec
        1. Challenges
    3. Converting the word2vec model from black box to white box
      1. Distributional similarity based representation
    4. Understanding the components of the word2vec model
      1. Input of the word2vec
      2. Output of word2vec
      3. Construction components of the word2vec model
        1. Architectural component
    5. Understanding the logic of the word2vec model
      1. Vocabulary builder
      2. Context builder
      3. Neural network with two layers
        1. Structural details of a word2vec neural network
        2. Word2vec neural network layer's details
        3. Softmax function
      4. Main processing algorithms
        1. Continuous bag of words
        2. Skip-gram
    6. Understanding algorithmic techniques and the mathematics behind the word2vec model
      1. Understanding the basic mathematics for the word2vec algorithm
      2. Techniques used at the vocabulary building stage
        1. Lossy counting
          1. Using it at the stage of vocabulary building
          2. Applications
      3. Techniques used at the context building stage
        1. Dynamic window scaling
          1. Understanding dynamic context window techniques
        2. Subsampling
        3. Pruning
    7. Algorithms used by neural networks
      1. Structure of the neurons
        1. Basic neuron structure
      2. Training a simple neuron
        1. Define error function
          1. Understanding gradient descent in word2vec
        2. Single neuron application
        3. Multi-layer neural networks
          1. Backpropagation
        4. Mathematics behind the word2vec model
      3. Techniques used to generate final vectors and probability prediction stage
        1. Hierarchical softmax
        2. Negative sampling
    8. Some of the facts related to word2vec
    9. Applications of word2vec
    10. Implementation of simple examples
      1. Famous example (king - man + woman)
    11. Advantages of word2vec
    12. Challenges of word2vec
    13. How is word2vec used in real-life applications?
    14. When should you use word2vec?
    15. Developing something interesting
      1. Exercise
    16. Extension of the word2vec concept
      1. Para2Vec
      2. Doc2Vec
      3. Applications of Doc2vec
      4. GloVe
      5. Exercise
    17. Importance of vectorization in deep learning
    18. Summary
  8. Rule-Based System for NLP
    1. Understanding of the rule-based system
      1. What does the RB system mean?
    2. Purpose of having the rule-based system
      1. Why do we need the rule-based system?
      2. Which kind of applications can use the RB approach over the other approaches?
      3. Exercise
      4. What kind of resources do you need if you want to develop a rule-based system?
    3. Architecture of the RB system
      1. General architecture of the rule-based system as an expert system
      2. Practical architecture of the rule-based system for NLP applications
      3. Custom architecture - the RB system for NLP applications
      4. Exercise
      5. Apache UIMA - the RB system for NLP applications
    4. Understanding the RB system development life cycle
    5. Applications
      1. NLP applications using the rule-based system
      2. Generalized AI applications using the rule-based system
    6. Developing NLP applications using the RB system
      1. Thinking process for making rules
        1. Start with simple rules
          1. Scraping the text data
          2. Defining the rule for our goal
          3. Coding our rule and generating a prototype and result
      2. Exercise
      3. Python for pattern-matching rules for a proofreading application
      4. Exercise
      5. Grammar correction
      6. Template-based chatbot application
        1. Flow of code
        2. Advantages of template-based chatbot
        3. Disadvantages of template-based chatbot
      7. Exercise
    7. Comparing the rule-based approach with other approaches
    8. Advantages of the rule-based system
    9. Disadvantages of the rule-based system
    10. Challenges for the rule-based system
    11. Understanding word-sense disambiguation basics
    12. Discussing recent trends for the rule-based system
    13. Summary
  9. Machine Learning for NLP Problems
    1. Understanding the basics of machine learning
      1. Types of ML
        1. Supervised learning
        2. Unsupervised learning
        3. Reinforcement learning
    2. Development steps for NLP applications
      1. Development step for the first iteration
      2. Development steps for the second to nth iteration
    3. Understanding ML algorithms and other concepts
      1. Supervised ML
        1. Regression
        2. Classification
          1. ML algorithms
      2. Exercise
      3. Unsupervised ML
        1. k-means clustering
        2. Document clustering
        3. Advantages of k-means clustering
        4. Disadvantages of k-means clustering
      4. Exercise
      5. Semi-supervised ML
        1. Other important concepts
          1. Bias-variance trade-off
          2. Underfitting
          3. Overfitting
          4. Evaluation matrix
      6. Exercise
        1. Feature selection
          1. Curse of dimensionality
          2. Feature selection techniques
          3. Dimensionality reduction
    4. Hybrid approaches for NLP applications
      1. Post-processing
    5. Summary
  10. Deep Learning for NLU and NLG Problems
    1. An overview of artificial intelligence
      1. The basics of AI
        1. Components of AI
          1. Automation
          2. Intelligence
      2. Stages of AI
        1. Machine learning
        2. Machine intelligence
        3. Machine consciousness
      3. Types of artificial intelligence
        1. Artificial narrow intelligence
        2. Artificial general intelligence
        3. Artificial superintelligence
      4. Goals and applications of AI
        1. AI-enabled applications
    2. Comparing NLU and NLG
      1. Natural language understanding
      2. Natural language generation
    3. A brief overview of deep learning
    4. Basics of neural networks
      1. The first computation model of the neuron
      2. Perceptron
      3. Understanding mathematical concepts for ANN
        1. Gradient descent
          1. Calculating error or loss
          2. Calculating gradient descent
        2. Activation functions
          1. Sigmoid
          2. TanH
          3. ReLu and its variants
        3. Loss functions
    5. Implementation of ANN
      1. Single-layer NN with backpropagation
        1. Backpropagation
      2. Exercise
    6. Deep learning and deep neural networks
      1. Revisiting DL
      2. The basic architecture of DNN
      3. Deep learning in NLP
      4. Difference between classical NLP and deep learning NLP techniques
    7. Deep learning techniques and NLU
      1. Machine translation
    8. Deep learning techniques and NLG
      1. Exercise
      2. Recipe summarizer and title generation
    9. Gradient descent-based optimization
    10. Artificial intelligence versus human intelligence
    11. Summary
  11. Advanced Tools
    1. Apache Hadoop as a storage framework
    2. Apache Spark as a processing framework
    3. Apache Flink as a real-time processing framework
    4. Visualization libraries in Python
    5. Summary
  12. How to Improve Your NLP Skills
    1. Beginning a new career journey with NLP
    2. Cheat sheets
    3. Choose your area
    4. Agile way of working to achieve success
    5. Useful blogs for NLP and data science
    6. Grab public datasets
    7. Mathematics needed for data science
    8. Summary
  13. Installation Guide
    1. Installing Python, pip, and NLTK
    2. Installing the PyCharm IDE
    3. Installing dependencies
    4. Framework installation guides
    5. Drop your queries
    6. Summary

Product information

  • Title: Python Natural Language Processing
  • Author(s): Jalaj Thanaki
  • Release date: July 2017
  • Publisher(s): Packt Publishing
  • ISBN: 9781787121423