使用elasticsearch建立搜索引擎

浅浅的花香味﹌ 2022-04-04 06:26 343阅读 0赞

使用elasticsearch建立搜索引擎

elasticsearch官方网址:https://www.elastic.co/products/elasticsearch

1、选择Elasticsearch的原因

(1)Elasticsearch是一个建立在全文搜索引擎 Apache Lucene™ 基础上的开源的实时分布式搜索和分析引擎,功能强大:

  • 支持全文搜索;
  • 分布式实时文件存储,并将每一个字段都编入索引,使其可以被搜索;
  • 实时分析的分布式搜索引擎;
  • 可以扩展到上百台服务器,处理PB级别的结构化或非结构化数据。

所有的这些功能被集成到一个服务里面,你的应用可以通过简单的RESTful API、各种语言的客户端甚至命令行与之交互。

(2)简单易学,文档齐全

搜索引擎选择: Elasticsearch与Solr:http://www.cnblogs.com/chowmin/articles/4629220.html

2、安装并配置Elasticsearch

因为我们要使用ansj分词工具进行分词,最新的ansj与elasticsearch结合的工具包对应的elastic search的版本是5.0.1,所以我们下载5.0.1版本的elasticsearch。

(1)下载并解压

  1. wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.0.1.tar.gz
  2. sha1sum elasticsearch-5.0.1.tar.gz
  3. tar -xzf elasticsearch-5.0.1.tar.gz
  4. cd elasticsearch-5.0.1/

(2)启动ES

./bin/elasticsearch

16-12-11T17:28:33,912][INFO ][o.e.n.Node ] [rpA7Jx3] started

看到类似这一句的,则说明启动ES了

新开一个终端,查看是否运行成功

curl -XGET 'localhost:9200/?pretty'

出现如上形式内容,则说明ES运行成功。

可以按Ctrl-C关闭ES

3、安装并配置ansj分词器

进入es目录执行如下命令

./bin/elasticsearch-plugin install http://maven.nlpcn.org/org/ansj/elasticsearch-analysis-ansj/5.0.1.0/elasticsearch-analysis-ansj-5.0.1.0-release.zip

4、elasticsearch启动出现的错误解决

(1)Java HotSpot™ 64-Bit Server VM warning: INFO:
os::commit_memory(0x0000000085330000, 2060255232, 0) failed; error=‘Cannot allocate memory’ (errno=12)

由于elasticsearch5.0默认分配jvm空间大小为2g,修改jvm空间分配

  1. # vim config/jvm.options
  2. -Xms2g
  3. -Xmx2g

修改为

  1. -Xms512m
  2. -Xmx512m

(2)max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]

修改 /etc/security/limits.d/90-nproc.conf

原: soft nproc 1024
改为: soft nproc 2048

(3)max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

修改/etc/sysctl.conf配置文件,

  1. cat /etc/sysctl.conf | grep vm.max_map_count
  2. vm.max_map_count=262144

如果不存在则添加

echo "vm.max_map_count=262144" >>/etc/sysctl.conf

(4)max file descriptors [65535] for elasticsearch process likely too low, increase to at least [65536]

ulimit -n 65536

(5)[root@localhost elasticsearch-5.0.1]# ./bin/elasticsearch
[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root

注意:ES不能用root管理员身份启动

5、配置elasticsearch Java API

在pom.xml添加如下依赖:

  1. <!-- elasticsearch Java API -->
  2. <dependency>
  3. <groupId>org.elasticsearch.client</groupId>
  4. <artifactId>transport</artifactId>
  5. <version>5.0.1</version>
  6. </dependency>
  7. <dependency>
  8. <groupId>org.apache.logging.log4j</groupId>
  9. <artifactId>log4j-api</artifactId>
  10. <version>2.8.0</version>
  11. </dependency>
  12. <dependency>
  13. <groupId>org.apache.logging.log4j</groupId>
  14. <artifactId>log4j-core</artifactId>
  15. <version>2.8.0</version>
  16. </dependency>

6、elasticsearch教程

  • 官方教程:https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-targz.html
  • Elasticsearch基础教程:http://blog.csdn.net/cnweike/article/details/33736429
  • Elasticsearch JAVA API教程:http://www.07net01.com/2016/07/1603264.html

1. Java API批量导出

  1. Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch-bigdata").build();
  2. Client client = new TransportClient(settings)
  3. .addTransportAddress(new InetSocketTransportAddress("10.58.71.6", 9300));
  4. SearchResponse response = client.prepareSearch("bigdata").setTypes("student")
  5. .setQuery(QueryBuilders.matchAllQuery()).setSize(10000).setScroll(new TimeValue(6000 00))
  6. .setSearchType(SearchType.SCAN).execute().actionGet();//setSearchType(SearchType.Scan) 告诉ES不需要排序只要结果返回即可 setScroll(new TimeValue(600000)) 设置滚动的时间
  7. String scrollid = response.getScrollId();
  8. try {
  9. //把导出的结果以JSON的格式写到文件里
  10. BufferedWriter out = new BufferedWriter(new FileWriter("es", true));
  11. //每次返回数据10000条。一直循环查询直到所有的数据都查询出来
  12. while (true) {
  13. SearchResponse response2 = client.prepareSearchScroll(scrollid).setScroll(new TimeValue(1000000))
  14. .execute().actionGet();
  15. SearchHits searchHit = response2.getHits();
  16. //再次查询不到数据时跳出循环
  17. if (searchHit.getHits().length == 0) {
  18. break;
  19. }
  20. System.out.println("查询数量 :" + searchHit.getHits().length);
  21. for (int i = 0; i < searchHit.getHits().length; i++) {
  22. String json = searchHit.getHits()[i].getSourceAsString();
  23. out.write(json);
  24. out.write("\r\n");
  25. }
  26. }
  27. System.out.println("查询结束");
  28. out.close();
  29. } catch (FileNotFoundException e) {
  30. // TODO Auto-generated catch block
  31. e.printStackTrace();
  32. } catch (IOException e) {
  33. // TODO Auto-generated catch block
  34. e.printStackTrace();
  35. }

2. Java API 批量导入

  1. Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch-bigdata").build();
  2. Client client = new TransportClient(settings)
  3. .addTransportAddress(new InetSocketTransportAddress("10.58.71.6", 9300));
  4. try {
  5. //读取刚才导出的ES数据
  6. BufferedReader br = new BufferedReader(new FileReader("es"));
  7. String json = null;
  8. int count = 0;
  9. //开启批量插入
  10. BulkRequestBuilder bulkRequest = client.prepareBulk();
  11. while ((json = br.readLine()) != null) {
  12. bulkRequest.add(client.prepareIndex("bigdata", "student").setSource(json));
  13. //每一千条提交一次
  14. if (count% 1000==0) {
  15. bulkRequest.execute().actionGet();
  16. System.out.println("提交了:" + count);
  17. }
  18. count++;
  19. }
  20. bulkRequest.execute().actionGet();
  21. System.out.println("插入完毕");
  22. br.close();
  23. } catch (FileNotFoundException e) {
  24. e.printStackTrace();
  25. } catch (IOException e) {
  26. // TODO Auto-generated catch block
  27. e.printStackTrace();
  28. }

elasticsearch导入数据的两种方法

第一种方法:手动导入

1、cat test.json

  1. {"index":{"_index":"stuff_orders","_type":"order_list","_id":903713}}
  2. {"real_name":"刘备","user_id":48430,"address_province":"上海","address_city":"浦东新区","address_district":null,"address_street":"上海市浦东新区广兰路1弄2号345室","price":30.0,"carriage":6.0,"state":"canceled","created_at":"2013-10-24T09:09:28.000Z","payed_at":null,"goods":["营养早餐:火腿麦满分"],"position":[121.53,31.22],"weight":70.0,"height":172.0,"sex_type":"female","birthday":"1988-01-01"}

2、导入elasticsearch

  1. [root@ELKServer opt]# curl -XPOST localhost:9200/stuff_orders/_bulk?pretty --data-binary @test.json
  2. {
  3. "took" : 600,
  4. "errors" : false,
  5. "items" : [ {
  6. "index" : {
  7. "_index" : "stuff_orders",
  8. "_type" : "order_list",
  9. "_id" : "903713",
  10. "_version" : 1,
  11. "_shards" : {
  12. "total" : 2,
  13. "successful" : 1,
  14. "failed" : 0
  15. },
  16. "status" : 201
  17. }
  18. } ]
  19. }

3、查看elasticsearch是否存在数据

  1. [root@ELKServer opt]# curl localhost:9200/stuff_orders/order_list/903713?pretty
  2. {
  3. "_index" : "stuff_orders",
  4. "_type" : "order_list",
  5. "_id" : "903713",
  6. "_version" : 1,
  7. "found" : true,
  8. "_source" : {
  9. "real_name" : "刘备",
  10. "user_id" : 48430,
  11. "address_province" : "上海",
  12. "address_city" : "浦东新区",
  13. "address_district" : null,
  14. "address_street" : "上海市浦东新区广兰路1弄2号345室",
  15. "price" : 30.0,
  16. "carriage" : 6.0,
  17. "state" : "canceled",
  18. "created_at" : "2013-10-24T09:09:28.000Z",
  19. "payed_at" : null,
  20. "goods" : [ "营养早餐:火腿麦满分" ],
  21. "position" : [ 121.53, 31.22 ],
  22. "weight" : 70.0,
  23. "height" : 172.0,
  24. "sex_type" : "female",
  25. "birthday" : "1988-01-01"
  26. }
  27. }

第二种方法:从数据库中导入

1、下载安装插件elasticsearch-jdbc-2.3.4.0

  1. wget http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.4.0/elasticsearch-jdbc-2.3.4.0-dist.zip
  2. # elasticsearch-jdbc-2.3.4.0-dist.zip的版本要和你安装的elasticsearch对应。
  3. unzip elasticsearch-jdbc-2.3.4.0-dist.zip
  4. mv elasticsearch-jdbc-2.3.4.0 /usr/local/
  5. cd /usr/local/elasticsearch-jdbc-2.3.4.0/

2、配置脚本

  1. vim import.sh
  2. #!/bin/sh
  3. JDBC_IMPORTER_HOME=/usr/local/elasticsearch-jdbc-2.3.4.0
  4. bin=$JDBC_IMPORTER_HOME/bin
  5. lib=$JDBC_IMPORTER_HOME/lib
  6. echo ‘{
  7. "type" : "jdbc",
  8. "jdbc": {
  9. "elasticsearch.autodiscover":true,
  10. "elasticsearch.cluster":"my-application", #簇名 详见:/usr/local/elasticsearch/config/elasticsearch.yml
  11. "url":"jdbc:mysql://localhost:3306/test", #mysql数据库地址
  12. "user":"test", #mysql用户名
  13. "password":"1234", #mysql密码
  14. "sql":"select *,id as _id from workers_info",
  15. "elasticsearch" : {
  16. "host" : "192.168.10.49",
  17. "port" : 9300
  18. },
  19. "index" : "myindex", #新的index
  20. "type" : "mytype" #新的type
  21. }
  22. }‘| java -cp "${lib}/*" -Dlog4j.configurationFile=${bin}/log4j2.xml org.xbib.tools.Runner org.xbib.tools.JDBCImporter
  23. chmod + import.sh
  24. sh import.sh

3、查看数据是否导入elasticsearch

  1. [root@ELKServer bin]# curl -XGET http://localhost:9200/myindex/mytype/_search?pretty‘
  2. {
  3. "took" : 15,
  4. "timed_out" : false,
  5. "_shards" : {
  6. "total" : 5,
  7. "successful" : 5,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : 1,
  12. "max_score" : 1.0,
  13. "hits" : [ {
  14. "_index" : "myindex",
  15. "_type" : "mytype",
  16. "_id" : "AVZyXCReGHjmX33dpJi3",
  17. "_score" : 1.0,
  18. "_source" : {
  19. "id" : 1,
  20. "workername" : "xing",
  21. "salary" : 10000,
  22. "tel" : "1598232123",
  23. "mailbox" : "xing@qq.com",
  24. "department" : "yanfa",
  25. "sex" : "F",
  26. "qq" : 736019646,
  27. "EmployedDates" : "2012-12-21T00:00:00.000+08:00"
  28. }
  29. } ]
  30. }
  31. }

实战代码

古诗文搜索引擎实战github地址:https://github.com/AngelaFighting/gushiwensearch

1、启动ES

Windows系统,在ES目录的bin目录中打开命令行窗口,输入命令:·elasticsearch.bat·,回车,如果看到ES集群显示started并且状态为Green,则说明启动成功

2、使用浏览器打开首页

在这里插入图片描述

输入要查询的内容,并选择搜索范围,点击搜索按钮
在这里插入图片描述
可看到匹配的结果数和各个结果的部分信息。

点击某篇诗文的链接,就可以查看诗文的详细信息了。
在这里插入图片描述

发表评论

表情:
评论列表 (有 0 条评论,343人围观)

还没有评论,来说两句吧...

相关阅读