The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Modeling the navigation structure of a dynamic web application is a challenging task because of the presence of dynamic pages. In particular, there are two problems to be dealt wi...
Standard cursive handwriting recognition is based on a language model, mostly a lexicon of possible word hypotheses or character n-grams. The result is a list of word alternatives...
Web logs, or blogs, challenge the notion of authorship. Seemingly, rather than a model in which the author's writings are themselves a contribution, the blog author weaves a ...
The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topica...
Mehran Sahami, Salim Yusufali, Michelle Q. Wang Ba...