About the Sentence Demo
Sentences form small, coherent units of text. By operating at the sentence level, downstream extraction tasks will be faster and more focused, including tasks such as tokenization, entity mention extraction, part-of-speech tagging, and relation discovery. LingPipe extracts sentences heuristically by identifying tokens in context that end sentences.
Genre-Specific Models
These demos provide examples of sentence extraction using the
package com.aliasi.sentence
. There are two sentence
models included with LingPipe, one for English news and one
for English biomedical text.
Sentence XML Markup
For this demo, sentences are marked by putting their text inside of a
specified element, s
, with an attribute i
providing an order identifier, counting from zero; e.g.
<s i="2">
.
Sentence Demo on the Web
The demos are hosted on the web at the following URLs:
Sentence Demo: English News Text
http://lingpipe-demos.com:8080/lingpipe-demos/sentence_en_news/textInput.html
Sentence Demo: English Biomedical Text
http://lingpipe-demos.com:8080/lingpipe-demos/sentence_en_bio/textInput.html
For detailed information about using web demos, including web form, file upload and web service instructions, see the web demo instructions
Sentence Demo via GUI
To launch the demo in a GUI, first change directories to the command directory and then invoke the demo batch script. Note: Parameters are set in the GUI, not as arguments to the launch script.
Windows Operating System
English News
> cd %LINGPIPE_HOME%\demos\generic\bin > gui_sentence_en_news.bat
English Biomedical
> cd %LINGPIPE_HOME%\demos\generic\bin > gui_sentence_en_bio.bat
Unix-like Operating Systems
English News
> cd $LINGPIPE_HOME/demos/generic/bin > sh gui_sentence_en_news.sh
English Biomedical
> cd $LINGPIPE_HOME/demos/generic/bin > sh gui_sentence_en_bio.sh
For detailed information about running demos in a GUI, see the GUI demo instructions
Sentence Demo via Shell Command
Shell commands may be run over single files, all of the files in a directory, or using standard input/output.
Running over a Directory
English News
> cd $LINGPIPE/demos/generic/bin > cmd_sentence_en_news.bat -inDir=../../data/testdir -outDir=/testout
English Biomedical
> cd $LINGPIPE/demos/generic/bin > cmd_sentence_en_bio.bat -inDir=../../data/testdir -outDir=/testout
Running a Single File
English News
> cd $LINGPIPE/demos/generic/bin > cmd_sentence_en_news.bat -inFile=../../data/testdir/foo.txt -outFile=foo.out.xml
Running through a Pipe (Standard input/output)
English News
> cd demos/generic/bin > echo See Spot. See Spot run. | cmd_sentence_en_news.bat
Running in Unix-like Operating Systems
For unix-like operating systems such as Unix, Solaris, Linux, or Macintosh OS X:
- Replace path backward slashes
(
\
) with forward slashes (/
), and - substitute
.sh
for the.bat
suffix in the command.
For detailed information about running demos from the command line, see the command line demo instructions
Sentence Demo Scripts
The following scripts are available in
$LINGIPE/demos/generic/bin
for running the demo. Note
that each script comes in four flavors, distinguishing
command line from GUI, and the Windows DOS shell from the Unix shell.
Language | Genre | Mode | Windows DOS | Unix/Linux/Mac sh |
---|---|---|---|---|
English | News | Command | cmd_sentence_en_news.bat |
cmd_sentence_en_news.sh |
GUI | gui_sentence_en_news.bat |
gui_sentence_en_news.sh |
||
English | Biomedical | Command | cmd_sentence_en_bio.bat |
cmd_sentence_en_bio.sh |
GUI | gui_sentence_en_bio.bat |
gui_sentence_en_bio.sh |
Sentence Demo Parameters
The following is a complete list of parameters for the demo.
General Demo Parameters
These parameters apply to every version (web/GUI/command) of every demo.
Parameter | Description | Usage Constraints |
---|---|---|
inCharset |
Input character set | Optional. Defaults to platform default. |
outCharset |
Output character set | |
contentType |
Input content type | May be one of:
text/plain . |
removeElts |
Element tags to remove | Optional. May only be used with contentType=text/html
or contentType=text/xml . Each value may be
comma-separated list. If neither of these are
specified, all text content is processed. |
includeElts |
Elements to annotate |
Command-Line Only Parameters
These parameters apply to every command-line demo, but are not relevant for the GUI or web versions of the demos.
Parameter | Description | Usage Constraints |
---|---|---|
inFile |
Readable input file | May not be used with inDir .
If either is not specified, defaults to standard input or output. |
outFile |
Writeable output file | |
inDir |
Readable input directory | May not be used with inFile or outFile .
If used, inDir and outDir must both be specified. |
outDir |
Writeable output directory |