Thông tin tài liệu
www.it-ebooks.info
Apache Solr 3 Enterprise
Search Server
Enhance your search with faceted navigation, result
highlighting, relevancy ranked sorting, and more
David Smiley
Eric Pugh
BIRMINGHAM - MUMBAI
www.it-ebooks.info
Apache Solr 3 Enterprise Search Server
Copyright © 2011 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the author(s), nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2009
Second published: November 2011
Production Reference: 2041111
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84951-606-8
www.packtpub.com
Cover Image by Duraid Fatouhi (duraidfatouhi@yahoo.com)
www.it-ebooks.info
Credits
Authors
David Smiley
Eric Pugh
Reviewers
Jerome Eteve
Mauricio Scheffer
Acquisition Editor
Sarah Cullington
Development Editors
Shreerang Deshpande
Gaurav Mehta
Technical Editor
Kavita Iyer
Project Coordinator
Joel Goveya
Proofreader
Steve Maguire
Indexers
Hemangini Bari
Rekha Nair
Production Coordinator
Alwin Roy
Cover Work
Alwin Roy
www.it-ebooks.info
About the Authors
Born to code, David Smiley is a senior software engineer with a passion for
programming and open source. He has written a book, taught a class, and presented
at conferences on the subject of Solr. He has 12 years of experience in the defense
industry at MITRE, using Java and various web technologies. Recently, David has
been focusing his attention on the intersection of geospatial technologies with Lucene
and Solr.
David rst used Lucene in 2000 and was immediately struck by its speed and
novelty. Years later he had the opportunity to work with Compass, a Lucene based
library. In 2008, David built an enterprise people and project search service with
Solr, with a focus on search relevancy tuning. David began to learn everything there
is to know about Solr, culminating with the publishing of Solr 1.4 Enterprise Search
Server in 2009—the rst book on Solr. He has since developed and taught a two-day
Solr course for MITRE and he regularly offers technical advice to MITRE and its
customers on the use of Solr. David also has experience using Endeca's competing
product, which has broadened his experience in the search eld.
On a technical level, David has solved challenging problems with Lucene and Solr
including geospatial search, wildcard ngram query parsing, searching multiple
multi-valued elds at coordinated positions, and part-of-speech search using Lucene
payloads. In the area of geospatial search, David open sourced his geohash prex/
grid based work to the Solr community tracked as SOLR-2155. This work has led to
presentations at two conferences. Presently, David is collaborating with other Lucene
and Solr committers on geospatial search.
www.it-ebooks.info
Acknowledgement
Most, if not all authors seem to dedicate their book to someone. As simply a reader
of books I have thought of this seeming prerequisite as customary tradition. That
was my feeling before I embarked on writing about Solr, a project that has sapped
my previously "free" time on nights and weekends for a year. I chose this sacrice
and want no pity for what was my decision, but my wife, family and friends did not
choose it. I am married to my lovely wife Sylvie who has easily sacriced as much
as I have to work on this project. She has suffered through the rst edition with an
absentee husband while bearing our rst child—Camille. The second edition was
a similar circumstance with the birth of my second daughter—Adeline. I ofcially
dedicate this book to my wife Sylvie and my daughters Camille and Adeline, who
I both lovingly adore. I also pledge to read book dedications with new-found rst-
hand experience at what the dedication represents.
I would also like to thank others who helped bring this book to fruition. Namely, if it
were not for Doug Cutting creating Lucene with an open source license, there would
be no Solr. Furthermore, CNET's decision to open source what was an in-house
project, Solr itself, in 2006, deserves praise. Many corporations do not understand
that open source isn't just "free code" you get for free that others write: it is an
opportunity to let your code ourish in the outside instead of it withering inside.
Last, but not the least, this book would not have been completed in a reasonable
time were it not for the assistance of my contributing author, Eric Pugh. His own
perspectives and experiences have complemented mine so well that I am absolutely
certain the quality of this book is much better than what I could have done alone.
Thank you all.
David Smiley
www.it-ebooks.info
Eric Pugh has been fascinated by the "craft" of software development, and has been
heavily involved in the open source world as a developer, committer, and user for
the past ve years. He is an emeritus member of the Apache Software Foundation
and lately has been mulling over how we solve the problem of nding answers in
datasets when we don't know the questions ahead of time to ask.
In biotech, nancial services, and defense IT, he has helped European and American
companies develop coherent strategies for embracing open source search software.
As a speaker, he has advocated the advantages of Agile practices with a focus on
testing in search engine implementation.
Eric became involved with Solr when he submitted the patch SOLR-284 for Parsing
Rich Document types such as PDF and MS Ofce formats that became the single
most popular patch as measured by votes! The patch was subsequently cleaned
up and enhanced by three other individuals, demonstrating the power of the
open source model to build great code collaboratively. SOLR-284 was eventually
refactored into Solr Cell as part of Solr version 1.4.
He blogs at
http://www.opensourceconnections.com/
www.it-ebooks.info
Acknowledgement
When the topic of producing an update of this book for Solr 3 rst came up, I
thought it would be a matter of weeks to complete it. However, when David Smiley
and I sat down to scope out what to change about the book, it was immediately
apparent that we didn't want to just write an update for the latest Solr, we wanted
to write a complete second edition of the book. We added a chapter, moved around
content, rewrote whole sections of the book. David put in many more long nights
than I over the past 9 months writing what I feel justiable in calling the Second
Edition of our book. So I must thank his wife Sylvie for being so supportive of him!
I also want to thank again Erik Hatcher for his continuing support and mentorship.
Without his encouragement I wouldn't have spoken at Euro Lucene, or become
involved in the Blacklight community.
I also want to thank all of my colleagues at OpenSource Connections. We've come
a long way as a company in the last 18 months, and I look forward to the next 18
months. Our Friday afternoon hack sessions re-invigorate me every week!
My darling wife Kate, I know 2011 turned into a very busy year, but I couldn't be
happier sharing my life with you, Morgan, and baby Asher. I love you.
Lastly I want to thank all the adopters of Solr and Lucene! Without you, I wouldn't
have this wonderful open source project to be so incredibly proud to be a part of! I
look forward to meeting more of you at the next LuceneRevolution or Euro Lucene
conference.
www.it-ebooks.info
About the Reviewers
Jerome Eteve holds a MSc in IT and Sciences from the University of Lille (France).
After starting his career in the eld of bioinformatics where he worked as a
Biological Data Management and Analysis Consultant, he's now a Senior Application
Developer with interests ranging from architecture to delivering a great user
experience online. He's passionate about open source technologies, search engines,
and web application architecture.
He now works for WCN Plc, a leading provider of recruitment software solutions.
He has worked on Packt's Enterprise Solr published in 2009.
Mauricio Scheffer is a software developer currently living in Buenos Aires,
Argentina. He's worked in dot-coms on almost everything related to web application
development, from architecture to user experience. He's very active in the open
source community, having contributed to several projects and started many projects
of his own. In 2007 he wrote SolrNet, a popular open source Solr interface for
the .NET platform. Currently he's also researching the application of functional
programming to web development as part of his Master's thesis.
He blogs at
http://bugsquash.blogspot.com.
www.it-ebooks.info
www.PacktPub.com
This book is published by Packt Publishing. You might want to visit Packt's website
at www.PacktPub.com and take advantage of the following features and offers:
Discounts
Have you bought the print copy or Kindle version of this book? If so, you can get a
massive 85% off the price of the eBook version, available in PDF, ePub, and MOBI.
Simply go to
http://www.packtpub.com/apache-solr-3-enterprise-search-
server/book
, add it to your cart, and enter the following discount code:
as3esebk
Free eBooks
If you sign up to an account on www.PacktPub.com, you will have access to nine
free eBooks.
Newsletters
Sign up for Packt's newsletters, which will keep you up to date with offers,
discounts, books, and downloads.
You can set up your subscription at
www.PacktPub.com/newsletters.
Code Downloads, Errata and Support
Packt supports all of its books with errata. While we work hard to eradicate
errors from our books, some do creep in. Meanwhile, many Packt books have
accompanying snippets of code to download.
You can nd errata and code downloads at
www.PacktPub.com/support.
www.it-ebooks.info
[...]... ManifoldCF 32 4 Connectors 32 5 Putting ManifoldCF to use 32 5 Summary 32 8 Chapter 10: Scaling Solr Tuning complex systems Testing Solr performance with SolrMeter Optimizing a single Solr server (Scale up) Configuring JVM settings to improve memory usage MMapDirectoryFactory to leverage additional virtual memory 32 9 33 0 33 2 33 4 33 4 33 5 Enabling downstream HTTP caching Solr caching 33 5 33 8 Indexing performance 34 0... data to Solr in bulk Don't overlap commits Disabling unique key checking Index optimization factors 33 9 34 0 34 1 34 2 34 3 34 3 Enhancing faceting performance 34 5 Using term vectors 34 5 Improving phrase search performance 34 6 Moving to multiple Solr servers (Scale horizontally) 34 8 Replication 34 9 Starting multiple Solr servers 34 9 Configuring replication 35 1 Load balancing searches across slaves 35 2 Configuring... with Solr Wait, what about security? Building a Solr powered artists autocomplete widget with jQuery and JSONP AJAX Solr Using XSLT to expose Solr via OpenSearch OpenSearch based Browse plugin Installing the Search MBArtists plugin 294 294 295 295 296 297 298 30 3 30 5 30 6 30 6 Accessing Solr from PHP applications 30 9 solr- php-client 31 0 Drupal options 31 1 Apache Solr Search integration module Hosted Solr. .. balancing Sharding indexes 35 4 35 6 Indexing into the master server Configuring slaves Assigning documents to shards Searching across shards (distributed search) 35 2 35 3 35 7 35 8 Combining replication and sharding (Scale deep) 36 0 Where next for scaling Solr? Summary 36 3 36 4 Near real time search [ viii ] www.it-ebooks.info 36 2 Table of Contents Appendix: Search Quick Reference 36 5 Index 36 9 Quick reference... Solr by Acquia 31 2 31 2 Ruby on Rails integrations The Ruby query response writer 31 3 31 3 [ vii ] www.it-ebooks.info Table of Contents sunspot_rails gem Setting up MyFaves project Populating MyFaves relational database from Solr Build Solr indexes from a relational database Complete MyFaves website 31 4 31 5 31 6 31 8 32 0 Which Rails/Ruby library should I use? 32 2 Nutch for crawling web pages 32 3 Maintaining... queries 128 129 129 Range queries 131 Fuzzy queries Date math 131 132 Score boosting 133 Existence (and non-existence) queries 134 Escaping special characters 134 The Dismax query parser (part 1) 135 Searching multiple fields 137 Limited query syntax 137 Min-should-match 138 Basic rules Multiple rules What to choose 138 139 140 A default search Filtering Sorting Geospatial search Indexing locations Filtering... 12 14 15 16 18 20 23 24 25 27 28 29 MusicBrainz.org 30 One combined index or separate indices 31 One combined index 32 Problems with using a single combined index Separate indices Schema design Step 1: Determine which searches are going to be powered by Solr Step 2: Determine the entities returned from each search Step 3: Denormalize related data www.it-ebooks.info 33 34 35 36 36 37 Table of Contents... Chapter 8: Deployment Deployment methodology for Solr Questions to ask Installing Solr into a Servlet container Differences between Servlet containers Defining solr. home property 231 231 232 245 245 246 247 248 248 Logging HTTP server request access logs Solr application logging 249 250 251 A SearchHandler per search interface? Leveraging Solr cores Configuring solr. xml 254 256 256 Configuring logging output... 252 2 53 2 53 254 258 259 259 261 [ vi ] www.it-ebooks.info Table of Contents Monitoring Solr performance 262 Stats.jsp 2 63 JMX 264 Starting Solr with JMX 265 Securing Solr from prying eyes Limiting server access 270 270 Securing public searches Controlling JMX access 272 2 73 Securing index data 2 73 Controlling document access Other things to look at 2 73 274 Summary 275 Chapter 9: Integrating Solr Working... Working with included examples Inventory of examples Solritas, the integrated search UI Pros and Cons of Solritas SolrJ: Simple Java interface Using Heritrix to download artist pages SolrJ-based client for Indexing HTML SolrJ client API 277 278 278 279 281 2 83 2 83 285 287 Embedding Solr 288 Searching with SolrJ 289 Indexing 290 When should I use embedded Solr? In-process indexing Standalone desktop applications . systems 33 0
Testing Solr performance with SolrMeter 33 2
Optimizing a single Solr server (Scale up) 33 4
Conguring JVM settings to improve memory usage 33 4
MMapDirectoryFactory. memory 33 5
Enabling downstream HTTP caching 33 5
Solr caching 33 8
Tuning caches 33 9
Indexing performance 34 0
Designing the schema 34 0
Sending data to Solr
Ngày đăng: 07/03/2014, 06:20
Xem thêm: Apache Solr 3 Enterprise Search Server pptx, Apache Solr 3 Enterprise Search Server pptx