Solr/Crap: Difference between revisions
From charlesreid1
| Line 164: | Line 164: | ||
The example that we built created some sample XML documents that can be used as inputs to Solr. Go to <code>/opt/solr/example/exampledocs</code> to have a look. | The example that we built created some sample XML documents that can be used as inputs to Solr. Go to <code>/opt/solr/example/exampledocs</code> to have a look. | ||
( | ==Indexing Data== | ||
You will want to start by indexing data. In the exampledocs folder is a file post.jar - this can be used to POST xml data (i.e. index the data). | |||
If you're running Solr on a Tomcat server, index data by executing the command: | |||
<pre> | |||
$ java -jar -Durl=http://localhost:8080/solr-example/update post.jar solr.xml monitor.xml | |||
</pre> | |||
And if you're running Solr on Jetty, index data by executing the command: | |||
<pre> | |||
$ java -jar post.jar solr.xml monitor.xml | |||
</pre> | |||
==Indexing HTML/TXT Files== | |||
To index HTML and TXT files, you need to edit the search engine's schema configuration file. This is located in <code>/opt/solr/example/solr/conf/schema.xml</code>. I recommend making a backup copy before you touch it. | |||
===Add a data field for the text=== | |||
Find some lines that specify various data fields: | |||
<pre> | |||
<field name="category" type="text_general" indexed="true" stored="true"/> | |||
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/> | |||
<field name="last_modified" type="date" indexed="true" stored="true"/> | |||
<field name="links" type="string" indexed="true" stored="true" multiValued="true"/> | |||
</pre> | |||
and add a field of your own, for the body of the HTML or text document: | |||
<pre> | |||
<field name="category" type="text_general" indexed="true" stored="true"/> | |||
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/> | |||
<field name="last_modified" type="date" indexed="true" stored="true"/> | |||
<field name="links" type="string" indexed="true" stored="true" multiValued="true"/> | |||
<field name="body" type="text" indexed="true" stored="true"/> | |||
</pre> | |||
===Add a copy action for the text=== | |||
Now that a "body" field has been created, an action will be added that will copy anything put in the "body" field into the "text" field, since the text field is already defined/utilized. | |||
Find the lines that specify copy actions: | |||
<pre> | |||
<copyField source="includes" dest="text"/> | |||
<copyField source="manu" dest="manu_exact"/> | |||
</pre> | |||
and add a new copyField action: | |||
<pre> | |||
<copyField source="includes" dest="text"/> | |||
<copyField source="manu" dest="manu_exact"/> | |||
<copyField source="body" dest="text"/> | |||
</pre> | |||
(More later, tutorial still in progress..........) | |||
[[Category:Programs]] | [[Category:Programs]] | ||
[[Category:Web]] | [[Category:Web]] | ||
Revision as of 07:08, 8 June 2012
Solr is a search engine server that allows for querying via HTTP, JSON, or XML, and returns results in JSON or XML.
I'm trying to use it to create a searchable database of text files.
Installation
Download it and compile it by using Ant (a Java-based make program):
$ wget http://mirror.metrocast.net/apache/lucene/solr/3.6.0/apache-solr-3.6.0-src.tgz
$ tar xzf apache-solr-3.6.0-src.tgz
$ cd apache-solr-3.6.0
$ ant ivy-bootstrap # this installs ivy, an Ant dependency
$ ant compile
It'll take a couple of minutes to finish.
Test
You can test everything by running
$ ant test
Making War
Make a .war file by doing this:
$ cd /path/to/apache-solr-3.6.0/solr $ ant dist
Again, this will take a while.
Making Example
Make the Ant example by typing
$ cd /path/to/apache-solr-3.6.0/solr $ ant example
Running Solr on a Web Server
Using Jetty (Defualt)
To run Solr, you have to have a web server running locally. The example that is distributed with Solar is also distribute with Jetty, a lightweight Java web server. After you've finished running the above commands and have made the Solr example, type:
$ java -jar start.jar
This will start the Jetty server and get Solr running from within Jetty. Visiting hlocalhost:8983/solr/admin should look something like this:
Using Tomcat
NOTE:
|
You can run Solr through Tomcat, a Java-based HTTP server from the Apache Software Foundatino (contrast that with the more common C-based Apache HTTP server). See my Tomcat page for installation/run instructions for Tomcat.
Download Solr
See above.
Build Solr and Solr example
See above.
Create Solr user in Tomcat
in $CATALINA_HOME/conf/tomcat-users.xml, define a new admin user for Solr:
<role rolename="manager"/> <role rolename="admin"/> <user username="admin" password="password" roles="manager,admin"/>
Create standalone Solr example directory
You will want to create a standalone directory that holds your Solr example. Tomcat will run a particular instance of Solr out of this standalone directory. I used /opt/solr.
Now you'll copy the Solr example that you built above into /opt/solr:
$ cp -r /path/to/apache-solr-6.0/example /opt/solr/.
Specify Solr Data Directory
To specify where the Solr instance is located, you'll need to edit /opt/solr/conf/solrconfig.xml and change the dataDir tag to point to the standalone Solr example's data directory:
<dataDir>${solr.data.dir:/opt/solr/example/solr/data}</dataDir>
Tell Tomcat How To Run Solr
You can tell Tomcat how to run Solr by creating a docBase fragment that points to the Solr war file. Create this file:
$CATALINA_HOME/conf/Catalina/localhost/solr-example.xml
with the following contents:
<?xml version="1.0" encoding="utf-8"?> <Context docBase="/opt/solr/example/solr/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/solr/example/solr" override="true"/> </Context>
Test It Out
Start up the Tomcat server:
$CATALINA_HOME/bin/startup.sh run
and go to
http://localhost:8080/solr-example/admin
and you should see something like this:
Using Solr
So you've got Solr up and running, but you don't have any data. Let's fix that.
Following Apache's Solr tutorial: http://lucene.apache.org/solr/api/doc-files/tutorial.html
The example that we built created some sample XML documents that can be used as inputs to Solr. Go to /opt/solr/example/exampledocs to have a look.
Indexing Data
You will want to start by indexing data. In the exampledocs folder is a file post.jar - this can be used to POST xml data (i.e. index the data).
If you're running Solr on a Tomcat server, index data by executing the command:
$ java -jar -Durl=http://localhost:8080/solr-example/update post.jar solr.xml monitor.xml
And if you're running Solr on Jetty, index data by executing the command:
$ java -jar post.jar solr.xml monitor.xml
Indexing HTML/TXT Files
To index HTML and TXT files, you need to edit the search engine's schema configuration file. This is located in /opt/solr/example/solr/conf/schema.xml. I recommend making a backup copy before you touch it.
Add a data field for the text
Find some lines that specify various data fields:
<field name="category" type="text_general" indexed="true" stored="true"/> <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/> <field name="last_modified" type="date" indexed="true" stored="true"/> <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
and add a field of your own, for the body of the HTML or text document:
<field name="category" type="text_general" indexed="true" stored="true"/> <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/> <field name="last_modified" type="date" indexed="true" stored="true"/> <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> <field name="body" type="text" indexed="true" stored="true"/>
Add a copy action for the text
Now that a "body" field has been created, an action will be added that will copy anything put in the "body" field into the "text" field, since the text field is already defined/utilized.
Find the lines that specify copy actions:
<copyField source="includes" dest="text"/> <copyField source="manu" dest="manu_exact"/>
and add a new copyField action:
<copyField source="includes" dest="text"/> <copyField source="manu" dest="manu_exact"/> <copyField source="body" dest="text"/>
(More later, tutorial still in progress..........)