搜索引擎 apache-solr

刺骨的言语ヽ痛彻心扉 2022-09-30 14:48 267阅读 0赞

SOLR

1.Solr server setup

l Java environment setup

Download linux JDK 6 from this website :

http://java.sun.com/javase/downloads/index.jsp

After installing JDK, edit /ect/profile , add these code to the end of the file

JAVA_HOME=/usr/java/jdk1.6.0_16

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME

export PATH

export CLASSPATH

/usr/java/jdk1.6.0_16 is the folder of the jdk. You should change it ,if you don’t install jdk in this folder.

l Solr setup

  1. 1.Download solr (apache-solr-1.3.0.zip ) from this website:

http://ftp.kddilabs.jp/infosystems/apache/lucene/solr/

2.Install solr with following steps

  1. #unzip -q apache-solr-1.3.0.zip
  2. #cd apache-solr-1.3.0/example/
  3. # java -jar start.jar
  4. we can see that the Solr is running by loading http://localhost:8983/solr/admin/ in web browser. This is the main starting point for Administering Solr.

This is tutorial of solr http://lucene.apache.org/solr/tutorial.html.

2.Search Apach solr with php.

This is a tutorial of php solr client example:

http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

We use PHP Solr Client to access to solr server . Download PHP Solr Client from this website: http://code.google.com/p/solr-php-client/downloads/list

l Change default Solr index data schema.

Solr index data schema is in the folder of “apache-solr-1.3.0/example/solr/conf/ schema.xml”

This is the snippet of solr schema.

  1. <schema name="example" version="1.1">
  2. ...
  3. <fields>
  4. <field name="id" type="string" indexed="true" stored="true" required="true" />
  5. <field name="sku" type="textTight" indexed="true" stored="true" omitNorms="true"/>
  6. <field name="name" type="text" indexed="true" stored="true"/>
  7. ...
  8. </fields>
  9. <uniqueKey>id</uniqueKey>
  10. ...
  11. <defaultSearchField>text</defaultSearchField>
  12. ...
  13. </schema>

Edit the field element , change it as below:

product_name

To make this change active ,we have to restart Solr server as command like this:

#java -jar start.jar

l Create index by PHP

using php solr client , we can access to Solr easily.This is an example fo how to create an index by php.

<?php

require_once ‘Apache/Solr/Service.php’;

//10.60.0.111 is solr service ip.

$solr=new Apache_Solr_Service(‘10.60.0.111’,’8983’,’/solr’);

if (!$solr->ping())

{

echo(“service not responding”);

}

else

{

echo(“solr Service is available
“);

}

$parts=array(

‘1’=>array(

‘id’=>’a123’,

‘product_name’=>’garoon,test’

),

‘2’=>array(

‘id’=>’a456’,

‘product_name’=>’share360,test’

)

);

$documents = array();

foreach ( $parts as $item => $fields ) {

$part = new Apache_Solr_Document();

foreach ( $fields as $key => $value ) {

if ( is_array( $value ) ) {

foreach ( $value as $datum ) {

$part->setMultiValue( $key, $datum );

}

}

else {

$part->$key = $value;

}

}

$documents[] = $part;

}

try {

$solr->addDocuments( $documents );

$solr->commit();

$solr->optimize();

}

catch ( Exception $e ) {

echo $e->getMessage();

}

?>

l Search index by PHP .

This is an example of searching index by php

<?php

require_once ‘Apache/Solr/Service.php’;

$solr=new Apache_Solr_Service(‘10.60.0.111’,’8983’,’/solr’);

if (!$solr->ping())

{

echo(“service not responding”);

}

else

{

echo(“sucess”);

}

$offset = 0;

$limit = 10;

$query=”garoon”;

$response=$solr->search($query,$offset,$limit);

if ($response->getHttpStatus()==200)

{

if ( $response->response->numFound > 0 ) {

echo “$query
“;

foreach ( $response->response->docs as $doc )

{

echo “id: “.$doc->id.”product_name “.$doc->product_name. “—“;

echo ‘
‘;

}

echo ‘
‘;

}

}

else {

echo $response->getHttpStatusMessage();

}

?>

l delete index by PHP

<?php

require_once ‘Apache/Solr/Service.php’;

//10.60.0.111 is solr service ip.

$solr=new Apache_Solr_Service(‘10.60.0.111’,’8983’,’/solr’);

if (!$solr->ping())

{

echo(“service not responding
“);

}

else

{

echo(“solr Service is available
“);

}

$response=$solr->deleteById(“a123”);

echo($response->getHttpStatusMessage());

?>

l update index by PHP

If we want to update a document to index , there are two methods to resolve it :

Method 1: delete the document by id, and then add an new one to index.

Method 2: use the add method to directly add the document to index , because id is an indentify field, Solr server will use new document to cover the old one.

如何使Solr支持中文,日文和英文的全文搜索呢。apache提供提供了一个 cjk库函数供我们使用,具体使用参考:http://chaifeng.com/blog/2008/01/\_apache\_solr.html

默认情况下 Apache Solr 是不支持中文检索的,如果文档中包含中文,必须用完整的一句中文才能检索出内容。
下面以 Apache Solr 的演示程序为例,注意:粗体部分是需要修改的地方。
找到如下三行:


修改为:


找到如下两行:


修改为:


修改完毕,重新运行 Apache Solr 就可以对中文进行检索了,原先已经导入的文档需要重新导入。
记住原先的配置中有个 positionIncrementGap=”100” 一定要删除了,否则会有异常。

注意:如果是php编程,一定要让程序代码的编码格式为utf-8编码形式,不然创建索引会失败。

发表评论

表情:
评论列表 (有 0 条评论,267人围观)

还没有评论,来说两句吧...

相关阅读

    相关 搜索引擎原理

    搜索引擎原理是指搜索引擎如何从互联网上的海量信息中找到用户所需的信息,并按照一定的排序规则呈现给用户。搜索引擎的原理包括以下几个方面: 1. 网络爬虫:搜索引擎通过网络爬虫

    相关 关于搜索引擎

    一.什么是搜索引擎? 广义的搜索引擎泛指网络(尤其是万维网)上提供信息检索服务的工具或系统,即在因特网上或通过因特网响应用户的搜索请求,返回相应查询结果的信息技术和系统.