Elasticsearch Blueprints

Book description

A practical project-based guide to generating compelling search solutions using the dynamic and powerful features of Elasticsearch

In Detail

Elasticsearch is a distributed search server similar to Apache Solr with a focus on large datasets, schemaless setup, and high availability. Utilizing the Apache Lucene library (also used in Apache Solr), Elasticsearch enables powerful full-text search, as well as autocomplete "morelikethis" search, multilingual functionality, and an extensive search query DSL.

This book starts with the creation of a Google-like web search service, enabling you to generate your own search results. You will then learn how an e-commerce website can be built using Elasticsearch. We will discuss various approaches in getting relevant content up the results, such as relevancy based on how well a query matched the text, time-based recent documents, geographically nearer items, and other frequently used approaches.

Finally, the book will cover various geocapabilities of Elasticsearch to make your searches similar to real-world scenarios.

What You Will Learn

  • Build a simple scalable server for effective searching in Elasticsearch
  • Design a scalable e-commerce search solution to generate accurate search results using various filters such as filters based on date range and price range
  • Improve the relevancy and scoring of your searches
  • Manage real-world, complex data using various techniques, including parent-child search and searching questions based on the criteria of questions and answers
  • Use the excellent data crunching and aggregation capability of Elasticsearch to analyze your data
  • Generate real-time visualizations of your data using compelling visualization techniques, such as time graphs, pie charts, and stacked graphs
  • Enhance the quality of your search and widen the scope of matches using various analyzer techniques, such as lower casing, stemming, and synonym matching

Table of contents

  1. Elasticsearch Blueprints
    1. Table of Contents
    2. Elasticsearch Blueprints
    3. Credits
    4. About the Author
    5. About the Reviewer
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Google-like Web Search
      1. Deploying Elasticsearch
      2. Communicating with the Elasticsearch server
        1. Shards and replicas
        2. Index-type mapping
      3. Setting the analyzer
        1. Types of character filters
        2. Types of tokenizers
        3. Types of token filters
        4. Creating your own analyzer
        5. Readymade analyzers
      4. Using phrase query to search
      5. Using the highlighting feature
      6. Pagination
        1. The head UI explained
      7. Summary
    9. 2. Building Your Own E-Commerce Solution
      1. Data modeling in Elasticsearch
      2. Choosing between a query and a filter
      3. Searching your documents
        1. A match query
          1. Multifield match query
      4. Aggregating your results
        1. Terms aggregation
      5. Filter your results based on a date range
      6. Implementing a prize range filter
      7. Implementing a category filter
      8. Implementation of filters in Elasticsearch
      9. Searching with multiple conditions
      10. Sorting results
      11. Using the scroll API for consistent pagination
      12. Autocomplete in Elasticsearch
        1. How does FST help in faster autocompletes?
      13. Hotel suggester using autocomplete
      14. Summary
    10. 3. Relevancy and Scoring
      1. How scoring works
        1. How to debug scoring
      2. The Ebola outbreak
        1. Boost match in the title field column over description
        2. Most recently published medical journals
        3. The most recent Ebola report on healthy patients
        4. Boosting certain symptoms over others
        5. Random ordering of medical journals for different interns
        6. Medical journals from the closest place to the Ebola outbreak
        7. Medical journals from unhealthy places near the Ebola outbreak
        8. Healthy people from unhealthy locations have Ebola symptoms
        9. Relevancy based on the order in which the symptoms appeared
      3. Summary
    11. 4. Managing Relational Content
      1. The product-with-tags search problem
      2. Nested types to the rescue
      3. Limitations on a query on nested fields
      4. Using a parent-child approach
        1. The has_parent filter/the has_parent query
          1. The has_child query/the has_child filter
          2. The top_children query
      5. Schema design to store questions and answers
      6. Searching questions based on a criteria of answers
      7. Searching answers based on a criteria of questions
      8. The score of questions based on the score of each answer
      9. Filtering questions with more than four answers
        1. Displaying the best questions and their accepted answers
      10. Summary
    12. 5. Analytics Using Elasticsearch
      1. A flight ticket analytics scenario
        1. Index creation and mapping
        2. A case study on analytics requirements
          1. Male and female distribution of passengers
          2. Time-based patterns or trends in booking tickets
          3. Hottest arrival and departure points
          4. The correlation of ticket type with time
          5. Distribution of the travel duration
          6. The most preferred or hottest hour for booking tickets
          7. The most preferred or hottest weekday for travel
          8. The pattern between a passenger's purpose of visit, ticket type, and their sex
      2. Summary
    13. 6. Improving the Search Experience
      1. News search
      2. A case-insensitive search
      3. Effective e-mail or URL link search inside text
      4. Prioritizing a title match over content match
      5. Terms aggregation giving weird results
        1. Setting the field as not_analyzed
      6. Using a lowercased analyzer
      7. Improving the search experience using stemming
      8. A synonym-aware search
      9. The holy box of search
        1. The field search
        2. The number/date range search
        3. The phrase search
        4. The wildcard search
        5. The regexp search
      10. Boolean operations
      11. Words with similar sounds
      12. Substring matching
      13. Summary
    14. 7. Spicing Up a Search Using Geo
      1. Restaurant search
      2. Data modeling for restaurants
      3. The nearest hotel problem
      4. The maximum distance covered
      5. Inside the city limits
      6. Distance values between the current point and each restaurant
        1. Restaurants out of city limits
      7. Restaurant categorization based on distance
      8. Aggregating restaurants based on their nearness
      9. Summary
    15. 8. Handling Time-based Data
      1. Overriding default mapping and settings in Elasticsearch
      2. Index template creation
        1. Deleting a template
        2. The GET template
        3. Multiple matching of templates
        4. Overriding default settings for all indices
        5. Overriding mapping of all types under an index
        6. Overriding default field settings
      3. Searching for time-based data
      4. Archiving time-based data
        1. Shard filtering
        2. Running the optimized API on indices where writing is done
      5. Closing older indices
        1. Snapshot creation and restoration of indices
          1. Repository creation
      6. Snapshot creation
        1. Snapshot creation on specific indices
      7. Restoring a snapshot
      8. Restoring multiple indices
      9. The curator
      10. Shard allocation using curator
        1. Opening and closing of indices
      11. Optimization
      12. Summary
    16. Index

Product information

  • Title: Elasticsearch Blueprints
  • Author(s): Vineeth Mohan
  • Release date: July 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781783984923