在Java Lucene中，每个术语在每个文档中重复多少次？

2024-06-01 • 问答

我已经索引了大约一千个Lucene文档，我想检索所有文档中所有术语的每个文档的术语频率，这就是我为事物编制索引的方式

        HashMap<Integer,String> documentList = getEachDocumentSeparated();
        Analyzer analyzer = new StandardAnalyzer();
        Directory index = FSDirectory.open(Paths.get(RESULT_ADDRESS));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenmode(IndexWriterConfig.Openmode.CREATE);
        IndexWriter w = new IndexWriter(index,config);
        FieldType fieldType = new FieldType((TextField.TYPE_STORED));
        IndexOptions indexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
        fieldType.setIndexOptions(indexOptions);
        for (Map.Entry<Integer,String> pair : documentList.entryset())
        {
        Document doc = new Document();
        Field bodyField = new Field("body",pair.getvalue(),fieldType);
        doc.add(new StringField("id",pair.getKey(),Field.Store.YES));
        doc.add(bodyField);
        w.addDocument(doc);
        }

例如，我想获得一个像下面这样的向量

期，1（5），2（10），330（2），500（1），1001（3）

表示文件1中的sterm重复了5次，而文件2中的{{1}}也重复了10次，依此类推...

在Java Lucene中，每个术语在每个文档中重复多少次？

ff136158 回答：在Java Lucene中，每个术语在每个文档中重复多少次？

大家都在问