HTML anchors are often surrounded by text that seems to describe the destination page appropriately. The text surrounding a link or the link-context is used for a variety of tasks...
The derivation trees of a tree adjoining grammar provide a first insight into the sentence semantics, and are thus prime targets for generation systems. We define a formalism, fea...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
We present a scalable joint language model designed to utilize fine-grain syntactic tags. We discuss challenges such a design faces and describe our solutions that scale well to l...
Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured d...