Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.sinks |
Implementations of the SinkTokenizer that might be useful.
|
org.apache.lucene.analysis.standard |
A fast grammar-based tokenizer constructed with JFlex.
|
org.apache.lucene.wikipedia.analysis |
Tokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class and Description |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
SinkTokenizer
Deprecated.
Use
TeeSinkTokenFilter instead |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
|
Modifier and Type | Class and Description |
---|---|
class |
CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
|
Modifier and Type | Class and Description |
---|---|
class |
ChineseTokenizer
Tokenize Chinese text as individual chinese characters.
|
Modifier and Type | Class and Description |
---|---|
class |
SentenceTokenizer
Tokenizes input text into sentences.
|
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Class and Description |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a
Tokenizer that extends LetterTokenizer
by additionally looking up letters in a given "russian charset". |
Modifier and Type | Class and Description |
---|---|
class |
DateRecognizerSinkTokenizer
Deprecated.
Use
DateRecognizerSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenRangeSinkTokenizer
Deprecated.
Use
TokenRangeSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenTypeSinkTokenizer
Deprecated.
Use
TokenTypeSinkFilter and TeeSinkTokenFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex
|
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.