In this paper, we examine the discursive situation of Wikipedia. The primary goal is to explore principle ways of analyzing and characterizing the various forms of communicative u...
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
Abstract. Temporal expressions—references to points in time or periods of time—are widespread in text, and their proper interpretation is essential for any natural language pro...
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...