Package com.aliasi.corpus.parsers

Classes for parsing various corpus data formats.

See:
          Description

Class Summary
AbstractMedTagParser The AbstractMedTagParser class provides an adapter for NCBI's MedTag corpora, including GeneTag and MedPost.
BrownPosParser The BrownPosParser class provides a parser for the NLTK distribution of the Brown Corpus.
BrownTextParser The BrownTextParser parses the Natural Language Toolkit (NLTK) distribution of the Brown Corpus.
GeneTagChunkParser The GeneTagChunkParser class is designed to parse the offset-annotated first-best GeneTag named entity corpus into a chunk-based representation.
GeneTagParser The GeneTagParser class provides a tag parser for the GeneTag named-entity corpus.
GeniaEntityChunkParser A GeniaEntityChunkParser provides an entity parser for the XML-formatted GENIA entity corpus.
GeniaPosParser The GeniaPosParser extracts the part-of-speech (POS) tags from the GENIA text POS corpus and sends them to the specified tag handler.
GeniaSentenceParser A GeniaSentenceParser provides a chunk parser for the XML version of the GENIA corpus.
GigawordTextParser A text parser for the Linguistic Data Consortium's English Gigaword Corpus.
MedlineTextParser A MedlineTextParser extracts all text from the abstracts of MEDLINE citations, passing them to the contained text handler.
MedPostPosParser The MedPostPosParser class provides a parser for MedPost part-of-speech corpus.
Muc6ChunkParser A Muc6ChunkParser parses MUC6-formatted named-entity corpora in XML.
RegexLineTagParser Provides a means of generating a tag parser based on a extracting zone boundaries and token/tag pairs from lines of data using regular expressions.
Reuters21578Parser A Reuters21578Parser provides a parser for the Reuters-21578 text categorization test collection.
SvmLightClassificationParser The SvmLightClassificationParser class parses (a generalization of) the widely-used SVMlight format for vector classification.
 

Package com.aliasi.corpus.parsers Description

Classes for parsing various corpus data formats.