What is LingPipe?

LingPipe is a suite of Java libraries for the linguistic analysis of human language.

Feature Overview

LingPipe's information extraction and data mining tools:

Architecture

LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:

Latest Release: LingPipe 3.5.1

Patch Release

The latest release of LingPipe is LingPipe 3.5.1. This release replaces LingPipe 3.5.0, with which it is fully backward compatible.

Bug Fixes

Fast Cache

The performance (speed and size) bug in util.FastCache has been patched. In addition, there's a new cache implementation util.HardFastCache that does not use soft references to reduce load on the garbage collector.

XML Element Stack Filter

The class xml.ElementStackFilter was patched to deal with implementations of SAX that reuse the same attributes element on each callback. Now the element stack filter copies the attributes to a local version for later access.

Missing Demo Files and Libraries

A demo input file for evaluating spell checking was missing and is now included.

Generic Demo XML Dependency

The NekoHTML package we use in our demos to parse HTML input depends on the XercesJ XML libraries, so we included them along with the appropriate classpaths on all of the commands and in the web service demos.

Convenience Methods and Generic Specifications

Version 3.5.1 includes a few new convenience methods in some classes such as util.Streams, util.AbstractCommand, util.Arrays, xml.TextAccumulatorHandler, stats.LogisticRegression. and all vector.Vector implementations.

The hmm.HmmDecoder cache specification was given generic specifications. (This change is fully backward compatible; the implementation did not change and generic specifications are optional.)

Generic Model Interface

Version 3.5.1 introduces a generic stats.Model interface. The language model implementations have been retrofitted to implement this interface.