如何按嵌套类型的数组大小过滤？

2024-05-05 • 问答

假设我具有以下类型：

{
    "2019-11-04": {
        "mappings": {
            "_doc": {
                "properties": {
                    "labels": {
                        "type": "nested","properties": {
                            "confidence": {
                                "type": "float"
                            },"created_at": {
                                "type": "date","format": "strict_date_optional_time||date_time||epoch_millis"
                            },"label": {
                                "type": "keyword"
                            },"updated_at": {
                                "type": "date","value": {
                                "type": "keyword","fields": {
                                    "numeric": {
                                        "type": "float","ignore_malformed": true
                                    }
                                }
                            }
                        }
                    },"params": {
                        "type": "object"
                    },"type": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

我想按labels数组的大小/长度进行过滤。我已经尝试了以下（as the official docs suggest）：

{
    "query": {
        "bool": {
            "filter": {
                "script": {
                    "script": {
                        "source": "doc['labels'].size > 10"
                    }
                }
            }
        }
    }
}

但我不断得到：

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception","reason": "runtime error","script_stack": [
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)","org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)","doc['labels'].size > 10","    ^---- HERE"
        ],"script": "doc['labels'].size > 10","lang": "painless"
      }
    ],"type": "search_phase_execution_exception","reason": "all shards failed","phase": "query","grouped": true,"failed_shards": [
      {
        "shard": 0,"index": "2019-11-04","node": "kk5MNRPoR4SYeQpLk2By3A","reason": {
          "type": "script_exception","script_stack": [
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)","    ^---- HERE"
          ],"lang": "painless","caused_by": {
            "type": "illegal_argument_exception","reason": "No field found for [labels] in mapping with types []"
          }
        }
      }
    ]
  },"status": 500
}

恐怕这是不可能的，因为字段labels不是ES保存的字段，也不是albiet在其上创建反向索引的字段。

Doc doc['fieldname']仅适用于创建反向索引的字段，Elasticsearch的Query DSL也仅适用于创建反向索引的字段，但不幸的是，nested type不是有效字段创建哪个反向索引。

话虽如此，我有以下两种方式。

为简单起见，我创建了示例映射，文档和两个可能对您有帮助的解决方案。

映射：

PUT my_sample_index
{
  "mappings": {
    "properties": {
      "myfield": {
        "type": "nested","properties": {
          "label": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

样本文档：

// single field inside 'myfield'
POST my_sample_index/_doc/1
{
  "myfield": {                              
    "label": ["New York","LA","Austin"]   
  }
}


// two fields inside 'myfield' 
POST my_sample_index/_doc/2
{                                          
  "myfield": {                             
    "label": ["London","Leicester","Newcastle","Liverpool"],"country": "England"
  }
}

解决方案1：使用Script Fields（在应用程序级别进行管理）

我有一个变通办法来获取所需的东西，虽然不完全正确，但可以帮助您在服务层或应用程序上进行过滤。

POST my_sample_index/_search
{
  "_source": "*","query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },"script_fields": {
    "label_size": {
        "script": {
            "lang": "painless","source": "params['_source']['labels'].size() > 1"
        }
    }
  }
}

您会注意到，作为响应，将使用label_size或true值创建一个单独的字段false。

示例响应如下：

{
  "took" : 5,"timed_out" : false,"_shards" : {
    "total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0
  },"hits" : {
    "total" : {
      "value" : 2,"relation" : "eq"
    },"max_score" : 1.0,"hits" : [
      {
        "_index" : "my_sample_index","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {
          "myfield" : {
            "label" : [
              "New York","Austin"
            ]
          }
        },"fields" : {
          "label_size" : [              <---- Scripted Field
            false
          ]
        }
      },{
        "_index" : "my_sample_index","_id" : "2","_source" : {
          "myfield" : {
            "country" : "England","label" : [
              "London","Liverpool"
            ]
          }
        },"fields" : {                  <---- Scripted Field
          "label_size" : [
            true                      <---- True because it has two fields 'labels' and 'country'
          ]
        }
      }
    ]
  }
}

请注意，只有第二个文档才有意义，因为它具有两个字段，即country和labels。但是，如果只希望将label_size和true一起使用，则必须在应用程序层进行管理。

解决方案2：Reindexing使用label.size，使用Script Processor

创建新索引，如下所示：

PUT my_sample_index_temp
{
  "mappings": {
    "properties": {
      "myfield": {
        "type": "nested","properties": {
          "label": {
            "type": "keyword"
          }
        }
      },"labels_size":{             <---- New Field where we'd store the size
        "type": "integer"
      }
    }
  }
}

创建以下管道：

PUT _ingest/pipeline/set_labels_size
{
  "description": "sets the value of labels size","processors": [
      {
        "script": {
          "source": """
            ctx.labels_size = ctx.myfield.size();
          """
        }
      }
    ]
}

使用Reindex API从my_sample_index索引中重新编制索引

POST _reindex
{
  "source": {
    "index": "my_sample_index"
  },"dest": {
    "index": "my_sample_index_temp","pipeline": "set_labels_size"
  }
}

使用my_sample_index_temp验证GET my_sample_index_temp/_search中的文档

{
  "took" : 1,"hits" : [
      {
        "_index" : "my_sample_index_temp","_source" : {
          "labels_size" : 1,<---- New Field Created 
          "myfield" : {
            "label" : [
              "New York","Austin"
            ]
          }
        }
      },{
        "_index" : "my_sample_index_temp","_source" : {
          "labels_size" : 2,<----- New Field Created
          "myfield" : {
            "country" : "England","Liverpool"
            ]
          }
        }
      }
    ]
  }
}

现在，您只需在查询中使用此字段labels_size，它的方式就更容易（更不用说高效了）。

希望这会有所帮助！

如何按嵌套类型的数组大小过滤？

nanaliv11 回答：如何按嵌套类型的数组大小过滤？

映射：

样本文档：

解决方案1：使用Script Fields（在应用程序级别进行管理）

解决方案2：Reindexing使用label.size，使用Script Processor

大家都在问