With the introduction of Ferret, Ruby users now have one of the fastest and most flexible search libraries available. And it's surprisingly easy to use.
This book will show you how to quickly get up and running with Ferret. You'll learn how to index different document types such as PDF, Microsoft Word, and HTML, as well as how to deal with foreign languages and different character encodings. Ferret describes the Ferret Query Language in detail along with the object-oriented approach to building queries.
You will also be introduced to sorting, filtering, and highlighting your search results, with an explanation of exactly how you need to set up your index to perform these tasks. You will also learn how to optimize a Ferret index for lightning fast indexing and split-second query results.
David Balmain is a freelance software developer and the primary developer of the open source search library Ferret. He gained an interest in high performance text processing at university where he earned a BSc specializing in natural language processing. Recently he has taken an interest in web application development and become enamored with the scripting language Ruby.
Currently David resides with his girlfriend in 12 square meter apartment in Tokyo where he practices Judo five hours a day and is trying to learn Japanese.
The animal on the cover of Ferret is a ferret. The scientific name for the domestic ferret is Mustela putorius furo, or "weasel-like smelly thief." These slender, carnivorous mammals are about 20 inches long-including a 5-inch tail-weigh 2-4 pounds, and live for 7-10 years. Common colors include albino, chocolate, butterscotch, silver, and cinnamon. The domestic ferret is sometimes confused with the wild black-footed ferret (Mustela nigripes), an endangered North American mammal related to the Russian polecat. Male ferrets are called hobs, female ferrets are jills, and young ferrets are kits. A group of ferrets is a business.
Ferrets were first bred 2,500 years ago in Africa for hunting rabbits. Today they are more often kept as pets, and are now the third most popular pet in the United States after cats and dogs. Ferrets are intelligent and playful; they can recognize their names and learn simple tricks. They have a habit of stealing household objects and hiding them-socks, keys, books, umbrellas, T.V. remotes, even fish out of bowls. When ferrets are excited and want to play, they bounce and flop around in a routine known as a "weasel war dance." They may also hiss and arch their backs. Ferrets in war dances tend to be clumsy, often hopping into things or tripping on their own feet.
Some parts of the world restrict the keeping of ferrets. A ferret-free zone, or FFZ, is a place where ferrets are banned or illegal. Three reasons are often cited for a ban: ferrets may bite or scratch children; there is no proven rabies vaccine for ferrets; and ferrets may threaten native wildlife. However, these points are often disputed. Former mayor of New York City Rudy Giuliani infamously clashed with ferret lovers in 2001, when the city council considered dropping the ban on ferrets and Giuliani opposed it, railing against ferrets as "wild animals." Still, many regions are being persuaded to change their anti-ferret laws, and the only U.S. states that now ban ferrets are California and Hawaii.
The cover image is from the Dover Pictorial Archive. The cover font is Adobe's ITC Garamond. The text font is Linotype Birka, the heading font is Adobe Myriad Condensed, and the code font is LucasFont's TheSans Mono Condensed.
Life is so much better with Ferret (and this PDF)!
By Kevin Marshall
Comments about oreilly Ferret:
I've been using Ferret for about a year now having initially stumbled through the implementation using a combination of the Java Lucene book from Manning and the online docs at the David's site...so I came into reading this PDF with at least a little experience on how things used to work (one of my applications even relies on Ferret to index and Java Lucene to search that index).
Since most of the stuff I've been using Ferret with has been running pretty good, my expectation was that maybe I would learn a few little tricks to tweak stuff here and there. But within the first few minutes of skimming through the PDF, I knew I was going to have to print this baby out and get ready for some serious fun!
I was clearly doing things the hard way before reading this PDF. Luckily, David lays out everything you would ever want to know about Ferret (and more) in a very easy-to-follow and even easier-to-implement way with lots of code examples and comments. So it was amazingly easy to get my indexing programs updated to take advantage of all the tips and tricks he lays out (and there are lots throughout this PDF).
All in all, if you run any sort of content based web site (or Ruby application), you probably need solid searhing features...and Ferret is by far the best answer to that problem that I know of...and this PDF is the absolute best way to get it all set up and running with the least amount of energy or worry.