The foswiki svn repository will become read-only on Friday 8/8. Developers should register for a http://github.com/ account for commit access to foswiki.

Solr Plugin

Enterprise Search Engine for Foswiki based on Solr

About Solr

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat or Jetty.

This extension comes with a Jetty configured to run a Solr webapp right away.

Installation

First follow the normal plugin instructions as follows. You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.

Open configure, and open the "Extensions" section. Use "Find More Extensions" to get a list of available extensions. Select "Install".

If you have any problems, or if the extension isn't available in configure, then you can still install manually from the command-line. See http://foswiki.org/Support/ManuallyInstallingExtensions for more help.

Downloading SolrPlugin-bin

SolrPlugin is distributed in two parts:

  1. SolrPlugin - the Foswiki specific part and
  2. SolrPlugin-bin - java binary package containing the latest stable Solr release, Jetty, and other required jar packages

Both have to be downloaded and installed. They are available at Foswiki:Extensions/SolrPlugin. After downloading both packages must be unpacked in your Foswiki installation directory (<foswiki-dir>).

HELP Experts Note: If there are pre-existing installations of Solr and Jetty (or Tomcat) already on the server, you may be able to re-use them by configuring them so they are aware of Foswiki content. In this case, the SolrPlugin-bin package may not be required.

Starting the Solr webservice

The SolrPlugin will send all content to be indexed to the Solr webservice via HTTP. This webservice must be installed in a servlet container of your choice, e.g. Tomcat or Jetty, and can be hosted anywhere on your network.

Preconfigured Jetty

SolrPlugin comes with a ready-to use Jetty engine, configured to start a Solr server on the same host where the Foswiki engine is running. This can be either started manually using the included solrstart tool, or be launched automatically if SolrPlugin can't contact the server.

First ensure that the FOSWIKI_ROOT environment variable points to the root of your Foswiki installation.

By default this Solr server listens to port 8983 on the localhost, and is configured to only allow connections from localhost for security reasons. You can change these settings in:
$FOSWIKI_ROOT/solr/etc/jetty.xml
Once you are happy with these settings you can start the Solr daemon in the background using:
cd $FOSWIKI_ROOT/solr
../tools/solrstart
You should see a number of output messages indicating that the server has started. You can check that the service is running by visiting http://localhost:8983/solr/ on the server.

Make sure that Foswiki is configured to connect to the Solr server at the correct URL in configure.

Before you can do any searches, you will have to add some context as described below.

Setting up separate cores for virtual hosting

SolrPlugin supports virtual hosting using Foswiki:Extensions/VirtualHostingContrib by assigning a separate "core" to each virtual host. This ensures that indexes are created for each virtual host separately and content from one host does not leak to another, while still using a single Solr service answering queries for all virtual hosts.

Multiple cores are located in sub-directories under ../solr/multicore/. By default SolrPlugin comes with a predefined core named foswiki. All data for this core is stored in the .../solr/multicore/foswiki/ directory. All cores have separate URLs to access them:

http://<server-name>:<port>/solr/<core-name>

To create a new core for mydomain.com, copy the $FOSWIKI_ROOT/solr/multicore/_template directory to a new directory, $FOSWIKI_ROOT/solr/multicore/mydomain.com. Manually create a soft-link to the shared configuration cd $FOSWIKI_ROOT/solr/multicore/mydomain.com && ln -s ../conf.

All known cores are listed in the ../solr/multicore/solr.xml file. So adding your new mydomain.com core alter this file to:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false" sharedLib="lib">
  <cores adminPath="/admin/cores" sharedSchema="true">
    <core name="foswiki" instanceDir="foswiki" /> <!-- default core -->
    <core name="mydomain.com" instanceDir="mydomain.com" /> <!-- additional core for mydomain.com -->
    <!-- ... other cores ... -->
  </cores>
</solr>

You must restart the Solr server when you make any changes to the solr.xml file. Test if your new core is available by accessing the admin console:

http://<server-name>:<port>/solr/<core-name>/admin

Note: depending on your Tomcat or Jetty configuration access to this port might be restricted. In this case use the $FOSWIKI_ROOT/tools/virtualhost-solrsearch tool to make a basic query.

cd $FOSWIKI_ROOT/tools
./virtualhost-solrsearch host=mydomain.com

Use the host parameter to access other virtual hosts.

There is a set of additional tools that have a variation for a virtual host setup:

  • virtualhost-solrsearch instead of solrsearch
  • virtualhost-solrindex instead of solrindex
  • virtualhost-solrdelete instead of solrdelete

The solrjob tool is a wrapper around solrindex and will use either solrindex or virtualhost-solrindex depending on the host commandline parameter.

Indexing existing content offline

Before using SolrSearch you will need to index your content completely. Let's first make sure the indexer is working by indexing a single topic (use virtualhost-solrindex is you are using virtual hosting):

perl $FOSWIKI_ROOT/tools/solrindex topic=Main.WebHome

Now make sure this topic shows up in SolrSearch.

If that worked, you can index the whole wiki.

perl $FOSWIKI_ROOT/tools/solrindex host=foswiki mode=full optimize=on

Replace foswiki with the name of your virtual host if you are using virtual hosting.

This will crawl all webs, topics and attachment and submit them to the Solr server, which will build up the search index. This can take a while depending on the amount of content and number of users registered to your site, so you may prefer to do it at a quiet time.

During this process attachments are "stringified" using the StringifierContrib. "Stringification" is the process of converting binary files into a plain text format that Solr can read. SolrPlugin will cache the stringified version of all attachments, and will only process them again if the corresponding binary version has changed. Thus the next full index run will be significantly faster.

SolrPlugin reads the access control information for a document while indexing it. This is indexed together with the document, and any request will take these under consideration so that only users with VIEW rights to a document can retrieve it using SolrSearch.

Setting up immediate indexing

Whenever a topic or attachment in Foswiki changes, Solr has to read the changed documents and update the index. This can either be done immediately when a topic is saved, an attachment is uploaded, or something is moved to a different location. the following settings are accessible through configure and control how SolrPlugin behaves:

To enable/disable updates on every save:
$Foswiki::cfg{SolrPlugin}{EnableOnSaveUpdates} = 0;

Enable/disable updates when a new attachment has been uploaded:
$Foswiki::cfg{SolrPlugin}{EnableOnUploadUpdates} = 0;

Enable/disable updates when a topic or attachment has been moved or deleted:
$Foswiki::cfg{SolrPlugin}{EnableOnRenameUpdates} = 1;

All but the last are disabled by default. That's because updating Solr's index might take a noticeable amount of time when clicking on "save" in the wiki editor, even more so when the saved topic has a lot of attachments.

The EnableOnRenameUpdates is enabled by default as this is a relatively infrequent operation.

Setting up offline indexing

It is strongly recommended to fully reindex all of your documents regularly. Every 24 hours is a good interval. This can be done using a cron job like this one:

0 0 * * * <foswiki-dir>/tools/solrjob --mode full

HELP Add --host all to index all virtual hosts, or --host <hostname> to index a single virtual host.

This will read all existing webs one by one and re-index the topics and attachments. Afterwards the index is optimized for size and performance.

You can also choose to delta index more frequently, by reindexing all documents that changed since the last time the delta indexing was performed. The simplest way to do this is to set up a cron job like this:

0-59/5 * * * * <foswiki-dir>/tools/solrjob --mode delta --hosts all

This will start solrindex in delta mode every 5 minutes, which is a good tradeoff of wasting resources vs. having all content updated in a timely manner.

HELP If you delta-index regularly, you probably don't need to EnableOnSaveUpdates, and vice-versa.

Instead of waiting for cron to trigger the delta indexing job, you can use iwatch for near-realtime indexing. iwatch is available on linux systems that implement the inotify kernel service. This uses the underlying operating system to trigger the solrjob script as soon as a file has changed. An example of an iwatch.xml file triggering a delta index job looks like this:

<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >

<config>
  <guard email="root@localhost" name="IWatch"/>
  <watchlist>
    <title>Foswiki</title>
    <contactpoint email="root@localhost" name="Administrator"/>
    <path type="recursive" alert="off" syslog="on" exec="su www-data -c '<foswiki-dir>/tools/solrjob --host mydomain.com --file %f'"><vhosts-dir>/mydomain.com/data</path>
    <!-- add other virtual hosts here -->
    <path type="regexception">\.tmp|\.sw\w|\.svn|\.lease|\.lock|,$|\.changes|,v|^_[0-9]|^log|^Temporary|^UnitTestCheck</path>
  </watchlist>
</config>

Make sure to replace <foswiki-dir>, <httpd-user> and <vhosts-dir> with the appropriate values on your platform.

Usage

We recommend that you replace Foswiki's default AutoViewTemplatePlugin with Foswiki:Extensions/AutoTemplatePlugin. This will allow you to replace the default WebSearch, WebSearchAdvanced, WebChanges and SiteChanges with a Solr-driven interface for better usability and performance.

Configure AutoTemplatePlugin by adding the following {ViewTemplateRules}

$Foswiki::cfg{Plugins}{AutoTemplatePlugin}{ViewTemplateRules} = {
...
  'WebSearchAdvanced' => 'SolrSearchView',
  'WebSearch' => 'SolrSearchView',
  'WebChanges' => 'WebChangesView',
  'SiteChanges' => 'SiteChangesView',
...
};

You might also override the WebSearch of an individual web using a rule along the following lines:

   'MyWeb.WebSearch' => 'SolrSearchView'

Faceted search interface

Todo: explain usage

Macros

SolrPlugin comes with a set of search macros tailored to the extensive capabilities of Solr's responses to search queries. All of them make use of the same set of options to render a response as listed in SOLRSEARCH.

SOLRSEARCH

This is the most important macro. It allows you to interact with the Solr server and display results within wiki applications. An example search looks like this:
%SOLRSEARCH{"test"
  format="   1 $web.$topic$n"
  sort="date desc"
}%

This will list the 10 most recently changed topics that match the string "test".

To list the 20 most recently changed topics topics that have the string "test" in their name use:
%SOLRSEARCH{"topic_search:test"
  format="   1 $web.$topic$n"
  sort="date desc"
  rows="20"
}%

SOLRSEARCH allows you to use the full power of the Lucene query language. This works with syntactically correct boolean queries like "title:foo OR body:foo". Consult the Lucene Query Syntax guide to learn more about how to form more complicated queries.

SOLRSEARCH also allows you to run a query in dismax mode. The dismax query parser only supports a subset of the Lucene syntax, but is highly tolerant of all sorts of strange user input. The query syntax is uses is familiar to many search engine users, and supports +/- and quotes for groupings words. The edismax mode adds several more powerful features, though still short of what is offered by the full Lucene syntax.

Parameter Description Default
id a search can be cached optionally for the time of the current request, for example using id="solr1". further calls to %SOLRFORMAT can make use of the cached solr response to render it independent from the location of the %SOLRSEARCH call on the wiki page  
search query string: depending on the search type this can either be a free-form text (type=dismax), a valid lucene query (type=standard) or a combination of both (edismax) *:*
type dismax/edismax/standard: query type standard
fields list of fields to be returned in the result; by default all fields in solr documents are returned; communication between Foswiki and the solr search can be optimized by specifying only those fields that you are interested in while rendering the response *, score
Flags:
jump on/off: jump to the topic specified explicitly in the seach string on
lucky on/off: jump to the first result found off
highlight switch on/off highlighting of found terms off
spellcheck switch on/off spellchecking to propose alternative spellings in case no search result was found off
Pagination:
start integer index within the result from where to start listing results 0
rows maximum number of documents to return 10
Filter parameters:
web filter by web: this can be any webname all
contributor filter by contributor to a topic  
filter lucene query to filter results  
extrafilter additional lucene filter query (see SolrSearchBaseTemplate for the difference in filter and extrafilter  
reverse on/off - reverts sorting if switched on; note: this overrides sorting order specified in sort off
sort sorting expression; examples: score desc, date desc, createdate, topic_sort  
Dismax Parameter:
boostquery a raw query string (in the solr query syntax) that will be included with the user's query to influence the score. example: type:topic^1000 will boost results of type topic see solrconfig.xml and SolrSearchBaseTemplate
queryfields list of fields and their boosts giving each field a significance when a term was found in them. the format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree see solrconfig.xml and SolrSearchBaseTemplate
phrasefields list of fields and their boosts similar to queryfields. this parameter may contain fields and boosts that pharses (specified in quotes) are matched against. boosting those fields higher than their counterpart in queryfields allows you to prefer phrase matches over separate word matches see solrconfig.xml and SolrSearchBaseTemplate
Faceting:
facets list of facets to be rendered during search; each facet can be a title=name pair specifying the facet name and the title label used to display it in the result; example:
%MAKETEXT{"Webs"}%=web, %MAKETEXT{"Topic type"}%=field_TopicType_lst
 
facetquery query to be used for a facet query  
facetoffset used to page through a list of facets being returned by a search  
facetlimit maximum number of values to be displayed per facet; this is a list of pairs name=integer specifying a per-facet limit; example: 50, tag=100, contributor=10, category=10 will constraint the global limit of facet values to be returned to 50, tags to 100, list the top 10 contributors in the hit set as well as the 10 most used categories 100
facetmincount minimum frequency of a facet to be included in the result 1
facetprefix prefix string of a facet to be included  
facetdatestart part of a date facet describing the start of a time interval NOW/DAY-7DAYS
facetdateend part of a date facet describing the end of a time interval NOW/DAY+1DAYS
facetdateother part of a date facet describing the time intervals excluding the one specified with facetdatestart and facetdateend before
hidesingle comma separated list of facets to be hidden if there's only one choice left  
disjunctivefacets list of facets that are queried using OR; so searching within one facet will expand the search instead of drilling down facet values are combined using AND
combinedfacets list of facets where values are queried in each of them using OR; for example listing field_ProjectMembers_lst and field_ProjectManager_s will result in a lucne filter of the form field_ProjectMembers_lst:WikiGuest OR field_ProjectManager_s:WikiGuest  
Formating results:
correction format string for corrections proposed by the spellchecker Did you mean <a href='$url'>$correction</a>
header format string prepended to the result  
format format string used to render each hit in the result set  
separator format string used to separate hit results rendered using format  
footer format string appended to the result  
header_interesting format string prepended to more-like-this queries (see %SOLRSIMILAR)  
format_interesting format string used to render more-like-this results  
separator_interesting format string used to separate hit results in more-like-this queries  
footer_interesting format string appended to more-like-this queries  
include_interesting regular expression terms must match in a more-lile-this result  
exclude_interesting regular expression terms must not match in a more-lile-this result  
header_<facet> format string prepended to a facet result  
format_<facet> format string used to render a facet value  
separator_<facet> format string used to separate facet values  
footer_<facet> format string appended to facet results  
include_<facet> regular expression facet values must match to be displayed  
exclude_<facet> regular expression facet values must not match to be displayed  

SOLRFORMAT

When a solr response has been cached using the id parameter to SOLRSEARCH, it can be reused by subsequent calls to %SOLRFORMAT.

%SOLRSEARCH{"test" 
  id="solr1"
  facets="web,contributor"
  facetlimit="web=10, contributor=10"
}%

<noautolink>
*Results:*
%SOLRFORMAT{"solr1"
  format="   1 [[$web.$topic][$topic]]$n"
}%

*Webs:*
%SOLRFORMAT{"solr1"
  format_web="   * $key ($count)$n"
}%

*Contributors:*
%SOLRFORMAT{"solr1"
  format_contributor="   * $key ($count)$n"
  exclude_contributor="UnknownUser|AdminGroup|AdminUser|RegistrationAgent|TestUser"
}%
</noautolink>

SOLRSIMILAR

SOLRSIMILAR allows to return a list of similar topics given the current one.

Parameter Description Default
"..." query string referencing the document(s) to return similar ones for id:Extensions.SolrPlugin
like list of fields used to compute similarity category, tag
fields list of fields and their boost value to be included in result items web, topic, title, score
filter restricts results to those matching this filter type:topic
include switches on/off inclusion of the matched document found in the query parameter off
limit maximum number of results to return 100
boost    
mintermfrequency    
mindocumentfrequency    
mindwordlength    
maxdwordlength    

%SOLRSIMILAR{"id:Extensions.SolrPlugin" filter="web:Extensions type:topic" fields="web,topic,title,score" header="

Similar Topics

$n
    " footer="
" format="
  • $percntDBCALL{\"Applications.RenderTopicThumbnail\" OBJECT=\"$web.$topic\" TYPE=\"plain\" }$percnt $title $percntDBQUERY{ header=\"
    \" topic=\"$web.$topic\" format=\"$formfield(Summary)\" footer=\"\" }$percnt
  • " separator="$n" rows="5" }%

    ---++++ SOLRSCRIPTURL
    
    ---+++ Rest inteface
    
    ---++++ search
    
    ---++++ terms
    
    ---++++ similar
    
    ---++++ autocomplete
    
    ---+++ Commandline tools
    
    ---++++ solrstart
    
    ---++++ solrindex
    
    ---++++ solrdelete
    
    ---+++ Perl interface
    
    ---++++ registerIndexTopicHandler()
    
    ---++++ registerIndexAttachmentHandler()
    

    Solr indexing schema

    SolrPlugin comes with a custom schema to index general Foswiki data as defined in the <solr-home-dir>conf/schema.xml file. It offers support for generic DataForm values, so adding any new DataForm definition will allow to use those formfields for faceting directly without changing configurations or having to reindex the content.

    The process of indexing content is configured on the Foswiki side which will crawl all webs, topics and their attachments thus creating lucene documents which are then sent over to the solr server. A lucene document is made up of fields of a certain type which defines the way the document should be processed by the solr server. This is configured in the schema.xml file.

    While the schema is able to cover all Foswiki related data it is still kept generic enough to be used for non-wiki content as well. Different kinds of content are distinguished using the collection field (see below).

    Field types

    This is the list of the most common field types used in the default schema. See the schema.xml for more exotic field types like point and location, useful for spatial search.

    Type Description
    string not analyzed, but indexed/stored verbatim
    boolean boolean values (true, false)
    binary the data should be sent/retrieved in as Base64 encoded strings
    int, float, long, double default numeric field types. for faster range queries, consider the tint/tfloat/tlong/tdouble types
    date the format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime. The trailing "Z" designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Note: for faster range queries, consider the tdate type
    text_ws a text field that only splits on whitespace for exact matching of words
    text a general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt", and down cases. At query time only, it also applies synonyms.
    text_generic same as text but also splits words on case change while generating word parts. a general unstemmed text field - good if one does not know the language of the field. this field type is usful when searching for parts of a WikiWord
    text_substring same as text_generic but with substring decomposition
    text_spell generic text analysis for spell checking
    text_sort this is a text field suitable for sorting alphabetically
    text_rev a general unstemmed text field that indexes tokens normally and also reversed, to enable more efficient leading wildcard queries.
    type a text field used to analyse different content type; Type mappings are defined in typemaping.txt; for example, that's where all image file extension are mapped to "image", same for "video"

    Fields

    Name Type Multivalued Stored Description
    timestamp tdate   stored time when the document was added to the index
    spell text_spell multivalued   used for spellchecking
    id string   stored unique identifier for each document; this is the external id usable in applications; there's an internal solr document id not related to this field
    collection string   stored identifies a set of documents comming from the same content collection; by default all content stored in Foswiki (topics and attachments) is gathered in the wiki collection set in Foswiki::cfg{SolrPlugin}{DefaultCollection}
    language string   stored language of the current document; this may be specified explicitly using the CONTENT_LANGUAGE preference, or set to "detect" to let the solr update chain detect the language automatically
    url string   stored url used to access the document being indexed
    type type   stored holds the type facet of the document; this is "image" for all kinds of images, "video" for all kinds of videos, "topic" for Foswiki topics and the verbatim file extension for everything else; note: plugins like Foswiki:Extensions/MetaCommentPlugin might use specific types as well (like "comment" in this case)
    web string   stored name of the web this document is located in
    topic string   stored name of the topic
    webtopic string   stored concatenation of the web and topic part
    access_granted string multivalued   this field controls access of users to this topic or attachment in the search index; every query is augmented with an ACL check against this field; only users listed in this field are allowed view rights; special value is "all" when there are no view restrictions
    title string   stored title of a document; a topic title is read from a TopicTitle formfield, a TOPICTITLE preference variable or defaults to the topic name itself; for attachments this is the filename with the extension stripped off
    summary text_generic   stored this is a plainified summary of the topic text
    author string   stored the name of the user that changed the document most recently
    contributor string multivalued stored list of users that contributed to this topic at some point in time
    date tdate   stored time the the document was changed last
    version float     current version of the topic
    text text_generic     document text
    createauthor string   stored author of the initial version of this document
    createdate tdate   stored date when the initial version of this document was created
    catchall text_generic multivalued stored copy-field that gathers content from (allmost) all fields; this is the default search field for the "standard" query parser; note that fields to be queried can be configured per request using the "dismax" handler
    substrings text_substring multivalued   holds substring analysis of the most important search fields
    phonetic phonetic multivalued   holds the phonetic analysis of the most important search fields
    state string     used by comments or any other application that tracks specific states of a document, such as "new", "unapproved", "approved", "draft", "unpublished", "published", ...
    parent string   stored parent topic of the current topic
    form string   stored name of the form attached to the current topic
    preference string multivalued stored this field catches all topic preferences. each preference is captured in a dynamic field as well (see dynamic fields below)
    attachment string multivalued stored list of all attachment names of this topic
    outgoing string multivalued stored list of all outgoing links; this information is used to detect backlinks
    category string multivalued stored list of categories this document is in; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; it will populate it with the list of all categories up to TopCategory; content of this field is copied to category_search as well (see generic fields below)
    tag string multivalued stored list of tags assigned to this document; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; content of this field is copied to category_search as well (see generic fields below)
    name string   stored filename of an attachment
    comment text_generic   stored comment field of an attachment
    size tint   stored size of an attachment in bytes

    Dynamic fields

    Dynamic fields are generated based on the content properties of the document to be indexed. Fields are specified using some kind of wildcard in schema.xml. When a document is indexed, the wildcard will be expanded to create a proper field name. Dynamic fields allow to apply specific ways of analyzing fields based on their name, as well as cover fields that aren't known in advance, like the name of all formfields of a DataForm that ever could be invented.

    When SolrPlugin is about to index a DataForm attached to a topic, it tries to guess the data type of each formfield. Normally, Foswiki does not specify any type information within a DataForm definition. Exceptions are (1) date: these are mapped to a *_dt field and (2) checkbox, select, radio, textboxlist: these are potentially multi-value fields and are thus indexed in a *_lst field.

    Every other formfield is stored into an *_s field as well as into a *_search field. The former captures the exact content while the latter analyses the text more thoroughly optimized for fuzzy searching.

    DataForm formfields are mapped to lucene document fields by prepending the field_* prefix to prevent name clashes with other dynamic fields generated on the fly. So for example a formfield ProjectManager will be stored in field_ProjectManager_s and field_ProjectManager_search. Likewise a select+multi formfield ProjectMembers will be stored in field_ProjectMembers_lst as it is a multivalued field.

    If a formfield name already comes with one of the below suffixes (_i, _l, _f, _dt, etc) then this suffix will be used instead of any heuristics trying to derive the best field type for the lucene field. That way DataForm fields although untyped by Foswiki can be indexed type-specific nevertheless.

    Similarly topic preferences are indexed using a preference_* prefix.

    Name Type Multivalued Stored Description
    *_i tint   stored fields with a _i suffix are indexed as an integer number
    *_l tlong   stored fields with a _l suffix are indexed as a long integer
    *_f tfloat   stored fields with a _f suffix are indexed as a float
    *_d tdouble   stored fields with a _f suffix are indexed as a double precision float
    *_b boolean   stored true, false
    *_s string   stored dynamic field for unanalyzed text
    *_t text_generic   stored generic text
    *_dt tdate   stored a dateTime value
    *_lst string multivalued stored this field is used for any multi-valued formfield in DataForms like, select, radio, checkbox, textboxlist
    preference_* string   stored preference values such as preference_NAMEOFPREFERENCE_t
    *_search text_generic   stored generic text, optimized for searching
    *_sort text_sort   stored text optimized for sorting alphabetically

    Copy fields

    Finally, after having defined all field type there are some fields that are created by copying some source field to a destination field using the copyField feature of solr. So while most of a lucene document to be indexed is created by the crawler and indexer explicitly, some more are created automatically to facilitate specific search applications. The destination fields are then analysed using the dynamic field definitions as given above.

    Source Destination
    web web_sort
    topic topic_sort
    title title_sort
    category category_search
    tag tag_search
    title title_search
    topic topic_search
    web web_search
    webtopic webtopic_search
    attachment catchall
    category catchall
    comment catchall
    field_* catchall
    form catchall
    name catchall
    tag catchall
    text catchall
    title catchall
    topic catchall
    type catchall
    state catchall
    attachment substrings
    category substrings
    comment substrings
    contributor substrings
    field_* substrings
    form substrings
    name substrings
    tag substrings
    text substrings
    title substrings
    topic substrings
    type substrings
    attachment phonetic
    category phonetic
    comment phonetic
    contributor phonetic
    field_* phonetic
    form phonetic
    name phonetic
    tag phonetic
    text phonetic
    title phonetic
    topic phonetic
    type phonetic
    attachment spell
    comment spell
    field_* spell
    form spell
    name spell
    text spell
    title spell
    topic spell
    web spell

    ---++ Templates
    
    ---+++ Structure of !SolrSearchBaseTemplate
    
    ---+++ Replacing !WebSearch and !WebChanges
    
    ---+++ Creating custom search interfaces
    

    Dependencies

    NameVersionDescription
    Foswiki::Contrib::InfiniteScrollContrib>=1.0Optional
    Foswiki::Contrib::JQMomentContrib>=1.0Required
    Foswiki::Contrib::JQPrettyPhotoContrib>=1.0Required
    Foswiki::Contrib::JQSerialPagerContrib>=1.0Required
    Foswiki::Contrib::JQTwistyContrib>=1.0Required
    Foswiki::Contrib::StringifierContrib>=1.20Required
    Foswiki::Plugins::AutoTemplatePlugin>=1.0Optional
    Foswiki::Plugins::ClassificationPlugin>=1.0Optional
    Foswiki::Plugins::DBCachePlugin>=1Optional
    Foswiki::Plugins::FilterPlugin>=2.0Required
    Foswiki::Plugins::ImagePlugin>=3.0Required
    Foswiki::Plugins::MimeIconPlugin>=0Required
    Foswiki::Plugins::TagCloudPlugin>=1.0Required
    Foswiki::Plugins::FlexWebListPlugin>=1.91Required
    JSON::XS>=2.231Required
    LWP::UserAgent>=5.820Required
    Any::Moose>=0.17Required
    XML::Easy>0Required
    HTML::Entities>=3.64Required
    File::MMagic>0Required
    DBI>=1Required

    Plugin Info

    Author: Foswiki:Main.MichaelDaum
    Copyright: © 2009-2014, Michael Daum http://michaeldaumconsulting.com
    License: GPL (GNU General Public License)
    Release: 2.10
    Version: 2.10
    Home: Foswiki:Extensions/SolrPlugin
    Support: Foswiki:Support/SolrPlugin
    Change History:  
    28 May 2014: implemented new ACL style compatible with Foswiki >= 1.2
    14 Jul 2013: added support for PiwikPlugin
    14 Mar 2013: improved indexing performance; added configurable http timeouts takling to the solr backend; fixed language mappings for multilingual content; fixes due to latest changes in jquery.moment
    17 Oct 2011: fixed WebServices::Solr to only encode to utf8 if needed; fixed handling character encoding on a pure utf8 foswiki; fixed schema for spell correction
    29 Sep 2011: improved schema.xml: replaced StandardTokenizer with WhitespaceTokenizer, using new ClassicTokenizer and ClassicFilter to feed the spellchecker, switched spellchecker to JaroWinklerDistance and lowered the frequency threshold for a term to be added to the spellchecker; building the spellchecker when optimizing the index now; fixed detecting the content language
    28 Sep 2011: added multilanguage support per document; fixed default values in %SOLRSIMILAR; speeding up indexing by better caching ACLs; implemented mapping facet values to any other label; during query time; added Language facet to default search interface
    26 Sep 2011: improved default boosting in dismax to prefer topic hits a lot stronger than attachments; improved default cache settings for better default performace; added support to distribute updates and search in a master-slave setup; added boostquery, queryfields, phrasefields parameter to customize boosting and sorting; improved default schema while documenting it
    21 Sep 2011: upgrading to solr-3.4.0; fixed utf8 handling; added jump and i-feel-lucky options; made hidesingle configurable per facet; added disjunctivefacets and combinedfacets; fixed handling of date fields; support new ui::autocomplete in JQueryPlugin; using type-specific icons in Foswiki:Extensions/MimeIconPlugin if installed; fixed quoting lucene queries; indexing outgoing links to support fast backlinks; adding fields createauthor, language and collection to schema; disabling phonetic boost in schema by default; be more robust in case of mallformed DataForm definitions; copying every string field into a search field also to allow exact as well as fuzzy search; enhancing normalizeWebTopicName to create uniform web names using dots, not slashes everywhere; fixed parsing inline topic permissions; externalized sidebar pager into a new plugin of its own: Foswiki:Extensions/JQSerialPagerContrib; upgrading to WebService::Solr-0.14 ... which now requires CPAN:XML::Easy instead of CPAN:XML::Generator; lots of improvements to SolrSearchBaseTemplate; now supporting Foswiki:Extensions/InfiniteScrollContrib in SolrSearch; documentation improvements
    19 Apr 2011: shipping a multicore setup by default; added support for Foswiki:Extensions/VirtualHostingContrib; fixed utf8 recoding; some usability improvements to faceted search interface; fixing illegal control characters in output (Oliver Schaub)
    16 Dec 2010: added state field to schema used for approval workflows; added solrjob to ease cronjobbing indexing; added docu how to use iwatch for almost-realtime indexing; fixed dependencies to include Foswiki:Extensions/FilterPlugin as well; fixed mapping facet values to their display title in search interface; fixed delta updates not properly removing outdated attachment entries when these where moved/renamed; and some minor html improvements
    03 Dec 2010: fixed solr-based WebChanges and SiteChanges using PatternSkin
    01 Dec 2010: adjustments due to changes in stringifier api; fixed removal of deleted webs from search index
    22 Nov 2010: fixes integration with pattern skin
    18 Nov 2010: initial public release

    Topic attachments
    I Attachment Action Size Date Who Comment
    SolrPlugin-bin.tar.gzgz SolrPlugin-bin.tar.gz manage 75692.3 K 12 Jun 2013 - 09:24 MichaelDaum  
    SolrPlugin.md5md5 SolrPlugin.md5 manage 0.1 K 28 May 2014 - 09:55 MichaelDaum  
    SolrPlugin.sha1sha1 SolrPlugin.sha1 manage 0.2 K 28 May 2014 - 09:55 MichaelDaum  
    SolrPlugin.tgztgz SolrPlugin.tgz manage 549.1 K 28 May 2014 - 09:54 MichaelDaum  
    SolrPlugin.zipzip SolrPlugin.zip manage 698.5 K 28 May 2014 - 09:53 MichaelDaum  
    SolrPluginSnap1.pngpng SolrPluginSnap1.png manage 91.4 K 28 May 2014 - 09:52 MichaelDaum  
    SolrPluginSnap2.pngpng SolrPluginSnap2.png manage 154.3 K 28 May 2014 - 09:52 MichaelDaum  
    SolrPlugin_installerEXT SolrPlugin_installer manage 19.4 K 28 May 2014 - 09:55 MichaelDaum  
    Topic revision: r15 - 28 May 2014, MichaelDaum
     
    The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see CopyrightStatement. Creative Commons License