Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can ...
Saurabh Kataria, William Browuer, Prasenjit Mitra,...
Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
Selective sampling, a form of active learning, reduces the cost of labeling training data by asking only for the labels of the most informative unlabeled examples. We introduce a ...
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop des...
Dhruba Borthakur, Jonathan Gray, Joydeep Sen Sarma...
This paper describes one solution to the problem of how to select sequence, and link Web resources into a coherent, focused organization for instruction that addresses a user'...
Robert G. Farrell, Soyini D. Liburd, John C. Thoma...