基于Lire库搜索相似图片

今天药忘吃喽~ 2022-05-19 07:13 239阅读 0赞

什么是Lire

LIRE(Lucene Image REtrieval)提供一种的简单方式来创建基于图像特性的Lucene索引。利用该索引就能够构建一个基于内容的图像检索(content- based image retrieval,CBIR)系统,来搜索相似的图像。LIRE使用的特性都取自MPEG-7标准: ScalableColor、ColorLayout、EdgeHistogram。此外该类库还提供一个搜索该索引的方法。

下面直接介绍代码实现

代码结构

70

Gradle依赖为

  1. dependencies {
  2. compile fileTree(dir: 'libs', include: ['*.jar'])
  3. testCompile group: 'junit', name: 'junit', version: '4.11'
  4. compile group: 'us.codecraft', name: 'webmagic-core', version: '0.7.3'
  5. // https://mvnrepository.com/artifact/us.codecraft/webmagic-extension
  6. compile group: 'us.codecraft', name: 'webmagic-extension', version: '0.7.3'
  7. compile group: 'commons-io', name: 'commons-io', version: '2.6'
  8. compile group: 'org.apache.lucene', name: 'lucene-core', version: '6.4.0'
  9. compile group: 'org.apache.lucene', name: 'lucene-analyzers-common', version: '6.4.0'
  10. compile group: 'org.apache.lucene', name: 'lucene-queryparser', version: '6.4.0'
  11. // https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient
  12. compile group: 'org.apache.httpcomponents', name: 'httpclient', version: '4.5.6'
  13. }

爬取图片样本

使用WebMagic爬虫爬取华为应用市场应用的图标当做样本,WebMagic使用请看《WebMagic爬取应用市场应用信息》

  1. import us.codecraft.webmagic.Page;
  2. import us.codecraft.webmagic.Site;
  3. import us.codecraft.webmagic.Spider;
  4. import us.codecraft.webmagic.processor.PageProcessor;
  5. import us.codecraft.webmagic.selector.Selectable;
  6. /**
  7. * @author wzj
  8. * @create 2018-07-17 22:06
  9. **/
  10. public class AppStoreProcessor implements PageProcessor
  11. {
  12. // 部分一:抓取网站的相关配置,包括编码、抓取间隔、重试次数等
  13. private Site site = Site.me().setRetryTimes(5).setSleepTime(1000);
  14. public void process(Page page)
  15. {
  16. //获取名称
  17. String name = page.getHtml().xpath("//p/span[@class='title']/text()").toString();
  18. page.putField("appName",name );
  19. String downloadIconUrl = page.getHtml().xpath("//img[@class='app-ico']/@src").toString();
  20. page.putField("downloadIconUrl",downloadIconUrl );
  21. if (name == null || downloadIconUrl == null)
  22. {
  23. //skip this page
  24. page.setSkip(true);
  25. }
  26. //获取页面其他链接
  27. Selectable links = page.getHtml().links();
  28. page.addTargetRequests(links.regex("(http://app.hicloud.com/app/C\\d+)").all());
  29. }
  30. public Site getSite()
  31. {
  32. return site;
  33. }
  34. public static void main(String[] args)
  35. {
  36. Spider.create(new AppStoreProcessor())
  37. .addUrl("http://app.hicloud.com")
  38. .addPipeline(new MyPipeline())
  39. .thread(20)
  40. .run();
  41. }
  42. }

上面代码提取出来每个页面的图标下载URL,自定义了Pipeline来保存应用图标,使用Apache的HttpClient包来下载图片

  1. import org.apache.http.HttpEntity;
  2. import org.apache.http.client.methods.CloseableHttpResponse;
  3. import org.apache.http.client.methods.HttpGet;
  4. import org.apache.http.impl.client.CloseableHttpClient;
  5. import org.apache.http.impl.client.HttpClients;
  6. import us.codecraft.webmagic.ResultItems;
  7. import us.codecraft.webmagic.Task;
  8. import us.codecraft.webmagic.pipeline.Pipeline;
  9. import java.io.*;
  10. import java.nio.file.Paths;
  11. /**
  12. * @author wzj
  13. * @create 2018-07-17 22:16
  14. **/
  15. public class MyPipeline implements Pipeline
  16. {
  17. /**
  18. * 保存文件的路径,保存到资源目录下
  19. */
  20. private static final String saveDir = MyPipeline.class.getResource("/conf/image").getPath();
  21. /*
  22. * 统计数目
  23. */
  24. private int count = 1;
  25. /**
  26. * Process extracted results.
  27. *
  28. * @param resultItems resultItems
  29. * @param task task
  30. */
  31. public void process(ResultItems resultItems, Task task)
  32. {
  33. String appName = resultItems.get("appName");
  34. String downloadIconUrl = resultItems.get("downloadIconUrl");
  35. try
  36. {
  37. saveIcon(downloadIconUrl,appName);
  38. }
  39. catch (IOException e)
  40. {
  41. e.printStackTrace();
  42. }
  43. System.out.println(String.valueOf(count++) + " " + appName);
  44. }
  45. public void saveIcon(String downloadUrl,String appName) throws IOException
  46. {
  47. CloseableHttpClient client = HttpClients.createDefault();
  48. HttpGet get = new HttpGet(downloadUrl);
  49. CloseableHttpResponse response = client.execute(get);
  50. HttpEntity entity = response.getEntity();
  51. InputStream input = entity.getContent();
  52. BufferedInputStream bufferedInput = new BufferedInputStream(input);
  53. File file = Paths.get(saveDir,appName + ".png").toFile();
  54. FileOutputStream output = new FileOutputStream(file);
  55. byte[] imgByte = new byte[1024 * 2];
  56. int len = 0;
  57. while ((len = bufferedInput.read(imgByte, 0, imgByte.length)) != -1)
  58. {
  59. output.write(imgByte, 0, len);
  60. }
  61. input.close();
  62. output.close();
  63. }
  64. }

注意:可能华为应用市场有反爬虫机制,每次只能爬取1000个左右的图标。

Lire测试代码

注意:类中的IMAGE_PATH指定图片路径,INDEX_PATH指定索引保存位置,代码拷贝之后,需要修改路径。

indexImages方法是建立索引,searchSimilarityImage方法是查询最相似的图片,并把相似度打印出来。

GenericFastImageSearcher方法的第一个参数是指定搜索Top相似的图片,我设置的为5,就找出最相似的5个图片。

  1. ImageSearcher searcher = new GenericFastImageSearcher(5, CEDD.class);

图片越相似,给出的相似值越小,如果为1.0说明是原图片,下面是完整代码

  1. import net.semanticmetadata.lire.builders.DocumentBuilder;
  2. import net.semanticmetadata.lire.builders.GlobalDocumentBuilder;
  3. import net.semanticmetadata.lire.imageanalysis.features.global.CEDD;
  4. import net.semanticmetadata.lire.searchers.GenericFastImageSearcher;
  5. import net.semanticmetadata.lire.searchers.ImageSearchHits;
  6. import net.semanticmetadata.lire.searchers.ImageSearcher;
  7. import net.semanticmetadata.lire.utils.FileUtils;
  8. import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
  9. import org.apache.lucene.document.Document;
  10. import org.apache.lucene.index.DirectoryReader;
  11. import org.apache.lucene.index.IndexReader;
  12. import org.apache.lucene.index.IndexWriter;
  13. import org.apache.lucene.index.IndexWriterConfig;
  14. import org.apache.lucene.store.FSDirectory;
  15. import javax.imageio.ImageIO;
  16. import java.awt.image.BufferedImage;
  17. import java.io.FileInputStream;
  18. import java.io.IOException;
  19. import java.nio.file.Paths;
  20. import java.util.Iterator;
  21. import java.util.List;
  22. /**
  23. * @author wzj
  24. * @create 2018-07-22 11:16
  25. **/
  26. public class ImageSimilarityTest
  27. {
  28. /**
  29. * 图片保存的路径
  30. */
  31. private static final String IMAGE_PATH = "H:\\JAVA\\ImageSim\\conf\\image";
  32. /**
  33. * 索引保存目录
  34. */
  35. private static final String INDEX_PATH = "H:\\JAVA\\ImageSim\\conf\\index";
  36. public static void main(String[] args) throws IOException
  37. {
  38. //indexImages();
  39. searchSimilarityImage();
  40. }
  41. private static void indexImages() throws IOException
  42. {
  43. List<String> images = FileUtils.getAllImages(Paths.get(IMAGE_PATH).toFile(), true);
  44. GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder(false, false);
  45. globalDocumentBuilder.addExtractor(CEDD.class);
  46. IndexWriterConfig conf = new IndexWriterConfig(new WhitespaceAnalyzer());
  47. IndexWriter indexWriter = new IndexWriter(FSDirectory.open(Paths.get(INDEX_PATH)), conf);
  48. for (Iterator<String> it = images.iterator(); it.hasNext(); )
  49. {
  50. String imageFilePath = it.next();
  51. System.out.println("Indexing " + imageFilePath);
  52. BufferedImage img = ImageIO.read(new FileInputStream(imageFilePath));
  53. Document document = globalDocumentBuilder.createDocument(img, imageFilePath);
  54. indexWriter.addDocument(document);
  55. }
  56. indexWriter.close();
  57. System.out.println("Create index image successful.");
  58. }
  59. private static void searchSimilarityImage() throws IOException
  60. {
  61. IndexReader ir = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
  62. ImageSearcher searcher = new GenericFastImageSearcher(5, CEDD.class);
  63. String inputImagePath = "H:\\JAVA\\ImageSim\\conf\\image\\5.png";
  64. BufferedImage img = ImageIO.read(Paths.get(inputImagePath).toFile());
  65. ImageSearchHits hits = searcher.search(img, ir);
  66. for (int i = 0; i < hits.length(); i++)
  67. {
  68. String fileName = ir.document(hits.documentID(i)).getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0];
  69. System.out.println(hits.score(i) + ": \t" + fileName);
  70. }
  71. }
  72. }

测试结果如下:

70 1

源码下载

https://download.csdn.net/download/u010889616/10557157

发表评论

表情:
评论列表 (有 0 条评论,239人围观)

还没有评论,来说两句吧...

相关阅读

    相关 相似图片搜索的原理

    最近在做一些东西,想到计算两幅图片的相似程度,在知乎上看到这篇文章,特转下来看。 [作者:阮一峰][Link 1] 上个月,Google把["相似图片搜索][Link

    相关 相似图片搜索算法介绍

    前言 之前对图片聚类有一丢丢的研究,最近发现,使用一些相似图片搜索算法也可以实现图片聚类的目标:将同类别或差不多的图片聚在一起。所以整理出相似图片搜索算法介绍这篇文章,主