如何基于“整个单词”而不是包含内容对“ Hibernate with Lucene”搜索结果进行排序

我正在使用Hibernate Search在我们的商店应用程序中提供产品/商品的全文本搜索。 以下是我的Item类的样子:

@Entity
@Table(name = "items",indexes = {
    @Index(name = "idx_item_uuid",columnList = "uuid",unique = true),@Index(name = "idx_item_gtin",columnList = "gtin",})
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@EqualsAndHashCode(onlyExplicitlyIncluded = true,callSuper = true)
@ToString(exclude = {"storeItems"})
@Indexed
@AnalyzerDef(name = "ngram",tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),filters = {
        @TokenFilterDef(factory = StandardFilterFactory.class),@TokenFilterDef(factory = LowerCaseFilterFactory.class),@TokenFilterDef(factory = StopFilterFactory.class),@TokenFilterDef(factory = NGramFilterFactory.class,params = {
                @Parameter(name = "minGramSize",value = "1"),@Parameter(name = "maxGramSize",value = "3")})
    }
)
public class Item extends BaseModel {

  @Column(nullable = false)
  @Field(analyzer = @Analyzer(definition = "ngram"))
  private String name;

  @OneToMany(orphanRemoval = true,cascade = CascadeType.ALL,mappedBy = "item",fetch = FetchType.EAGER)
  @Fetch(FetchMode.SELECT)
  private List<Image> images;

  @OneToMany(mappedBy = "item",cascade = CascadeType.REFRESH)
  @Fetch(FetchMode.SELECT)
  @JsonIgnore
  @IndexedEmbedded(includePaths = {"store.uuid"})
  private Set<StoreItem> storeItems;

  @Enumerated(EnumType.STRING)
  private QuantityType quantityType;

  @Column(nullable = false,length = 14)
  private String gtin;

  private String articleSize;

  @ManyToOne(fetch = FetchType.EAGER)
  @JoinColumn(name = "brand_id",foreignKey = @ForeignKey(name = "fk_brands_items"))
  private Brand brand;

  private String supplierName;

  @ManyToOne(fetch = FetchType.EAGER)
  @JoinColumn(name = "category_id",foreignKey = @ForeignKey(name = "fk_categories_items"))
  @IndexedEmbedded(includePaths = {"uuid"})
  private Category category;

  private String taxType;

  private Double taxRate;

  @Lob
  private String marketingMessage;

  private boolean seasonal;

  private String seasonCode;

  @Lob
  private String nutritionalInformation;

  @Lob
  private String ingredients;

  private Double depth;

  private String depthUnit;

  private Double height;

  private String heightUnit;

  private Double width;

  private String widthUnit;

  private Double netContent;

  private String netContentUnit;

  private Double grossWeight;

  private String grossWeightUnit;

  private Double maxStorageTemp;

  private Double minStorageTemp;

  private Double maxTransportTemp;

  private Double minTransportTemp;

  private boolean organic;

  private String origin;

}

以下是我的自定义存储库在特定商店中搜索商品的方式:

  @Override
  public List<Item> findItemBySearchStrAndStoreUuid(final String searchStr,final String storeUuid) {
    final EntityManager entityManager = entityManagerFactory.createEntityManager();

    final FullTextEntityManager manager = Search.getFullTextEntityManager(entityManager);
    entityManager.getTransaction().begin();

    final QueryBuilder qb = manager.getSearchFactory()
        .buildQueryBuilder().forEntity(Item.class).get();

    final Query query = qb.bool()
        .must(qb.keyword().onField("name").matching(searchStr).createQuery())
        .must(qb.keyword().onField("storeItems.store.uuid").matching(storeUuid).createQuery())
        .createQuery();

    return executeQuery(entityManager,manager,query);
  }

我们在数据库中大约有13,000件商品,并且大多数都带有瑞典名称,因此,当客户用瑞典语“mjölk”搜索牛奶时,与牛奶相关的商品就会弹出,但是会出现,但排序不是我们想要的。

预期结果:

  1. mjölk
  2. mjölk巧克力
  3. 科科斯米约克

实际结果:

  1. 科科斯米约克
  2. mjölk巧克力
  3. mjölk

示例可能看起来好像我只需要颠倒排序,但是问题是实际结果并不是真正的结果,它们是随机的,但是问题是我需要首先使用牛奶,然后是具有“牛奶”的项目“作为一个整体,那么所有将其作为子字符串的项目”。

因此,请指导我如何增强分析器/查询器以实现这种排序,即使是单个字符,我也需要给出结果,搜索也应处理一些错字,因此,我在上述设置中使用了Ngram过滤器。

此外,我确实尝试使用SwedishLightStemFilterFactory,这确实有所帮助,但是除非有人完全正确地键入“mjölk”,否则项目就会停止显示。

谢谢。

Jocelynsun 回答:如何基于“整个单词”而不是包含内容对“ Hibernate with Lucene”搜索结果进行排序

您需要在同一属性上声明一个单独的字段,以用于排他性排序,然后为其分配规范化器而不是分析器。

请参见https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#section-normalizers

,

我会考虑两件事:

  • ASCIIFoldingFilterFactory:用普通的替换重音字符
  • 用于分析的单独分析器,其中值未标记且仅小写

在Hibernate中排序通常涉及不同的策略。

本文链接:https://www.f2er.com/2758103.html

大家都在问