|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.RegExTokenizerFactory
com.aliasi.tokenizer.LineTokenizerFactory
public class LineTokenizerFactory
A LineTokenizerFactory treats each line of an input as
a token. Whitespaces separating lines are simply newlines. This
is useful for decoders that work at the line level.
Line terminators are as defined in Pattern,
and include all of the Windows, Unix, and Macintosh standards, as well
as some unicode extensions.
Whitespaces will be either empty strings or strings representing one or more newlines.
Tokens may consist entirely of whitespace characters if whitespace is the only thing on a line. But tokens will never contain sequences representing newlines. Tokens will alwyas consist of at least one character.
Input String Tokens Whitespaces ""{}{ "" }"abc"{ "abc" }{ "", "" }"abc\ndef"{ "abc", "def" }{ "", "\n", "" }"abc\r\ndef"{ "abc", "def" }{ "", "\r\n", "" }"abc\r\ndef"{ "abc", "def" }{ "", "\r\n", "" }" abc\n def \n"{ " abc", " def " }{ "", "\n", "\n" }" \n"{ " " }{ "", "\n" }
A line tokenizer factory may be compiled. Upon deserialization,
the resulting class will be an instance of
RegExTokenizerFactory. In future versions, the
deserialized class may change, so it is safest to simply cast it
to the interface TokenizerFactory.
This tokenizer factory is nothing more than a convenience
wrapper around a very simple RegExTokenizerFactory, with
the simplest possible regular expression:
RegExTokenizerFactory(".+")
Because the regular expression tokenizer factory takes the
default regular expression flags (see Pattern),
the period (.) matches any character except a newline.
| Constructor Summary | |
|---|---|
LineTokenizerFactory()
Construct a line-based tokenizer. |
|
| Method Summary |
|---|
| Methods inherited from class com.aliasi.tokenizer.RegExTokenizerFactory |
|---|
compileTo, pattern, tokenizer |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public LineTokenizerFactory()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||