RDF is increasingly being used to represent metadata. RDF Site Summary (RSS) is an application of RDF on the Web that has considerably grown in popularity. However, the way RSS sy...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Statistical machine learning techniques for data classification usually assume that all entities are i.i.d. (independent and identically distributed). However, real-world entities...
Once information retrieval has located a document, and information extraction has provided its contents, how do we know whether we should actually believe it? Fact-finders are a ...
In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty ...
Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick...