我正在使用Lucene StopAnalyzer分析带有停用词The Brown Fox
的字符串the
,因此将其分析为两个术语[brown,fox]
。我能够知道brown
(4,9)和fox
(10,13)的偏移量,但是如何获取排除项the
的偏移量?
final Offsetattribute attribute = tokenStream.addAttribute(Offsetattribute.class);
tokenStream.reset();
final List<String> analyzedTerms = Lists.newArrayList();
final StringBuilder stringBuilder = new StringBuilder();
while (tokenStream.incrementToken()) {
final int startOffset = attribute.startOffset();
final int endOffset = attribute.endOffset();
final String original = text.substring(startOffset,endOffset);
System.out.println(original);
}