Book description
Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes
In Detail
Starting with vital information on setting up Solr, you will quickly progress to analyzing your text data through querying and performance improvement.
With the help of intermediate and advanced recipes, you will learn how to index data and query Solr. Then, you will deep dive into faceting and learn how to improve Solr's performance. You will also work with SolrCloud clusters and will get to grips with the advanced functionalities of Solr. Finally, you will explore real-life situations, where Solr can be used to simplify daily collection handling. By the end of this book, you will be able to produce enhanced, optimized, and powerful results by implementing pro-level practices and techniques.
What You Will Learn
- Acquire the skills needed to index your data in different formats, forms, and sources
- Overcome common problems while analyzing your data
- Use the faceting mechanism to get aggregated information about your data
- Improve your Solr instance and Solr cluster performance
- Get to know how to configure and use SolrCloud
- Make use of the highlighting and document grouping functionalities
- Diagnose and resolve problems with Solr instances and clusters
- Implement different autocomplete functionalities
Table of contents
-
Solr Cookbook Third Edition
- Table of Contents
- Solr Cookbook Third Edition
- Credits
- About the Author
- Acknowledgments
- About the Reviewers
- www.PacktPub.com
- Preface
-
1. Apache Solr Configuration
- Introduction
- Running Solr on a standalone Jetty
- Installing ZooKeeper for SolrCloud
- Migrating configuration from master-slave to SolrCloud
- Choosing the proper directory configuration
- Configuring the Solr spellchecker
- Using Solr in a schemaless mode
- Limiting I/O usage
- Using core discovery
- Configuring SolrCloud for NRT use cases
- Configuring SolrCloud for high-indexing use cases
- Configuring SolrCloud for high-querying use cases
- Configuring the Solr heartbeat mechanism
- Changing similarity
-
2. Indexing Your Data
- Introduction
- Indexing PDF files
- Counting the number of fields
- Using parsing update processors to parse data
- Using scripting update processors to modify documents
- Indexing data from a database using Data Import Handler
- Incremental imports with DIH
- Transforming data when using DIH
- Indexing multiple geographical points
- Updating document fields
- Detecting the document language during indexation
- Optimizing the primary key indexation
- Handling multiple currencies
-
3. Analyzing Your Text Data
- Introduction
- Using the enumeration type
- Removing HTML tags during indexing
- Storing data outside of Solr index
- Using synonyms
- Stemming different languages
- Using nonaggressive stemmers
- Using the n-gram approach to do performant trailing wildcard searches
- Using position increment to divide sentences
- Using patterns to replace tokens
-
4. Querying Solr
- Introduction
- Understanding and using the Lucene query language
- Using position aware queries
- Using boosting with autocomplete
- Phrase queries with shingles
- Handling user queries without errors
- Handling hierarchies with nested documents
- Sorting data on the basis of a function value
- Controlling the number of terms needed to match
- Affecting document score using function queries
- Using simple nested queries
- Using the Solr document query join functionality
- Handling typos with n-grams
- Rescoring query results
-
5. Faceting
- Introduction
- Getting the number of documents with the same field value
- Getting the number of documents with the same value range
- Getting the number of documents matching the query and subquery
- Removing filters from faceting results
- Using decision tree faceting
- Calculating faceting for relevant documents in groups
- Improving faceting performance for low cardinality fields
-
6. Improving Solr Performance
- Introduction
- Handling deep paging efficiently
- Configuring the document cache
- Configuring the query result cache
- Configuring the filter cache
- Improving Solr query performance after the start and commit operations
- Lowering the memory consumption of faceting and sorting
- Speeding up indexing with Solr segment merge tuning
- Avoiding caching of rare filters to improve the performance
- Controlling the filter execution to improve expensive filter performance
- Configuring numerical fields for high-performance sorting and range queries
-
7. In the Cloud
- Introduction
- Creating a new SolrCloud cluster
- Setting up multiple collections on a single cluster
- Splitting shards
- Having more than a single shard from a collection on a node
- Creating a collection on defined nodes
- Adding replicas after collection creation
- Removing replicas
- Moving shards between nodes
- Using aliasing
- Using routing
-
8. Using Additional Functionalities
- Introduction
- Finding similar documents
- Highlighting fragments found in documents
- Efficient highlighting
- Using versioning
- Retrieving information about the index structure
- Altering the index structure on a live collection
- Grouping documents by the field value
- Grouping documents by the query value
- Grouping documents by the function value
- Efficient documents grouping using the post filter
-
9. Dealing with Problems
- Introduction
- Dealing with the too many opened files exception
- Diagnosing and dealing with memory problems
- Configuring sorting for non-English languages
- Migrating data to another collection
- SolrCloud read-side fault tolerance
- Using the check index functionality
- Adjusting the Jetty configuration to avoid deadlocks
- Tuning segment merging
- Avoiding swapping
-
10. Real-life Situations
- Introduction
- Implementing the autocomplete functionality for products
- Implementing the autocomplete functionality for categories
- Handling time-sliced data using aliases
- Boosting words closer to each other
- Using the Solr spellchecking functionality
- Using the Solr administration panel for monitoring
- Automatically expiring Solr documents
- Exporting whole query results
- Index
Product information
- Title: Solr Cookbook - Third Edition
- Author(s):
- Release date: January 2015
- Publisher(s): Packt Publishing
- ISBN: 9781783553150
You might also like
book
Solr in Action
Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly …
book
Elasticsearch 8.x Cookbook - Fifth Edition
Search, analyze, store and manage data effectively with Elasticsearch 8.x Key Features Explore the capabilities of …
book
Elasticsearch: The Definitive Guide
Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine …
book
Mastering Apache Solr 7.x
Accelerate your enterprise search engine and bring relevancy in your search analytics About This Book A …