The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised acc...
Queries on major Web search engines produce complex result pages, primarily composed of two types of information: organic results, that is, short descriptions and links to relevan...
Cristian Danescu-Niculescu-Mizil, Andrei Z. Broder...
In typical open source software development, developers use revision control systems for product management, mailing list systems for human communications, and bug tracking system...
Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, informati...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...